Featured post

New DocTop API: identify meaningful topics in documents in 8 languages

We are excited to share the news: we have just launched new DocTop API for identifying and extracting meaningful topics from documents in English, French, German, Dutch, Italian, Portuguese, Greek and Spanish.

You can send multiple documents for analysis in a single JSON request. First 10 documents will be analyzed.

Curiosity rover exploring topics in text data

When do you want to extract topics from unstructured text? Here are a few reasons:

  • Find out what are the top standing out topics that your documents are talking about.
  • Identify trends. For this, you can systematically collect topics for new documents and compare with volumes of same topics in the past.
  • Tag documents with topics to group them.
  • Allow searching within topics of interest to get focused search results.
  • Track topics to make a sense of the textual data being produced and become more data driven.
  • Correlate topics with other facets of your data, like geography, user age brackets, gender etc.

Here is an example of how to use the API for English:

[
	{
		"article_id": 1,
		"text": "San Francisco considers banning sidewalk delivery robots"
        }	
]

In response to the request above you’ll get an array of identified topics:

[
  {
    "article_id":1,
    "topics":[
         "sidewalk delivery robots"
         "san francisco"
    ]
  }
]

And here is the request/response pair for a text in Spanish:

[
	{
		"article_id": 1,
		"text": "Apple está buscando comprar una startup del Reino Unido por mil millones de dólares"
    }	
]
[
  {
    "article_id":1,
    "topics":[
       "reino unido"
       "millones"
       "startup"
       "dólares"
       "apple"
    ]
  }
]

Subscribe to the DocTop API today and test it out on free 100 texts in any supported language!

Service Marathon

DATA ANALYSIS AND CLIENT EXPERIENCE MANAGEMENT

This past Saturday June 22, our CEO has taken part in Service Marathon held In Kyiv, Ukraine — remotely from New York City.

These were the main topics of the discussion:

• How and to what extent should client behavior be analyzed, what to do with large data sets (big data)?
• Where and how to collect data, interpret and use the results of the analysis? What metrics should I use?
• Is it possible based on the analysis of data to improve the service policy of the company and improve customer service?

In particular, Dmitry has paid a lot of attention to methods of text analysis for understanding customer feedback, improving services and internal business polices to achieve a better and smoother client-centric service.

As a business dealing with text data generated by your customers or target audiences, you want to:

  • Evaluate emotion / sentiment of your clients with Fuxi API (for Chinese) and RSA API (for Russian).
  • Extract main topics of discussion to stay on top of what’s important right now using DocTop API (supports a multitude of languages).
  • Not enough text facets? Order a comprehensive text understanding service from our specialists and improve your business processes.

Russian topics quality improved

Hello our dear users,

We are happy to announce the improvements in the quality of topics produced for texts in Russian.

The trick with Russian and similar languages (Ukrainian, Finnish, Belorussian, Polish) is rich morphology: lots of word endings (surface forms), grammatical cases will lead to abrupt topics or topics in a wrong case.

We have implemented additional analysis techniques that will compute base forms when applicable and remove prepositions on the boundaries of topics.

Enjoy and remember that topics are your gateway to categorizing large number of texts and bringing structure to unstructured.

The update is immediately available for all our users.

Consume the Topic API today: https://rapidapi.com/dmitrykey/api/topicapi

Insider team

Array of texts and quality improvements / Массив текстов и улучшения качества

Hello and Happy New Year!

(на русском — читайте ниже)

We are happy to let you know of three major changes to the RSA API for entity level sentiment analysis of Russian texts:

Mashape

You can now send in an array of up to 10 texts. Use the new end-point: https://russiansentimentanalyzer.p.mashape.com/rsa/sentiment/polarity/jsons/

Example of input with two texts:
 
[
  {
    "text": "Гиперответственный классный исполнитель :)\nОтдельный респект за подхваченное в 22-00 задание!",
    "article_id": 1,
    "include_strength": true
  },
  {
    "text": "быстро доставил,но претензии остались",
    "article_id": 2,
    "include_strength": true
  },
  {
    "text": "погода отличная"
  }
]

Response from the API will have polarity labels tagged with original article_id values, if provided (otherwise follows the input order of texts):

  
[
    {
        "sentiment": "POSITIVE",
        "strength": 1,
        "article_id": "1"
    },
    {
        "sentiment": "NEUTRAL",
        "strength": 0,
        "article_id": "2"
    },
    {
        "sentiment": "POSITIVE",
        "strength": 1
    }
]
  1. We have tuned the quality for both positive and negative tonality.
  2. We are back to Fremium model allowing you to send 100 texts a month for free.

Enjoy!

Insider team

Привет и с Новым годом!

Мы рады сообщить о трёх важных улучшениях в RSA API — системе объектного анализа тональности текстов на русском языке:

  1. Теперь за один запрос можно прислать до 10 текстов. Используйте новый энд-пойнт: https://russiansentimentanalyzer.p.mashape.com/rsa/sentiment/polarity/jsons/
    Пример запроса:
 
[
  {
    "text": "Гиперответственный классный исполнитель :)\nОтдельный респект за подхваченное в 22-00 задание!",
    "article_id": 1,
    "include_strength": true
  },
  {
    "text": "быстро доставил,но претензии остались",
    "article_id": 2,
    "include_strength": true
  },
  {
    "text": "погода отличная"
  }
]

Ответ системы будет содержать оригинальные article_id либо следовать изначальному порядку текстов:

  
[
    {
        "sentiment": "POSITIVE",
        "strength": 1,
        "article_id": "1"
    },
    {
        "sentiment": "NEUTRAL",
        "strength": 0,
        "article_id": "2"
    },
    {
        "sentiment": "POSITIVE",
        "strength": 1
    }
]
  1. Было улучшено качество распознавания позитивной и негативной тональности.
  2. Мы вернули модель Fremium, позволяющей присылать до 100 текстов в месяц бесплатно!
Mashape

Команда Insider

More languages to the mix

Dear users!

We are super thrilled to announce the support of virtually any language in TopicAPI. Upload and gets topics for your articles in German or Russian. We plan to add more languages in the coming weeks. Please let us know what language you would like the most: https://goo.gl/forms/5RtE8ywHY2GYrTYH2

Notice, that we have updated the documentation to reflect the multi-language feature: you need to supply a language code in the URL path to make sure your content is properly processed. See more here: https://market.mashape.com/dmitrykey/topicapi

Happy coming holidays and many insightful topics in your content!

Insider

Mashape

Мы гипер рады сообщить о том, что теперь TopicAPI поддерживает практически любой язык. Сейчас это русский и немецкий. Пожалуйста, дайте нам знать, какой язык вы хотели бы видеть в системе в первую очередь: https://goo.gl/forms/5RtE8ywHY2GYrTYH2

Обращаем ваше внимание, что мы изменили документацию к API: https://market.mashape.com/dmitrykey/topicapi

Счастливых праздников!

Insider

Sample code for grouping articles into themes

In this post we would like to share with you Java code snippets, that allow for loading data into our Topic API. The idea of the topic API is that it allows you to group your articles / posts / tweets / documents into topical themes and also search in the content.

In order to navigate oceans of textual data and extract useful structures from your content lakes search is one of the most common way to empower your journey. But once you have found thousands and thousands of matches, you still have the problem of the data overload. Topical grouping can help.

All code in this post you can find in our public GitHub repository. We further assume, that your article content is stored in a MySQL database. Using mybatis we load the articles with https://github.com/semanticanalyzer/nlproc_sdk_sample_code/blob/master/src/main/resources/mappers/ArticleEntryMapper.xml.

TopicLoader class takes care of doing it all: loading articles from the DB, forming a JSON request to the Topic API and uploading the relevant fields.

This main class takes single command line argument: resource_id, which matches onto the field of articles DB table in https://github.com/semanticanalyzer/nlproc_sdk_sample_code/blob/master/src/main/resources/mappers/ArticleEntryMapper.xml.

The method uploadDBEntries will load the articles entries from the DB and upload to the Topic API one by one. Note, that if your content is not very large in size (tweet size), then you can upload several posts in one single request (up to 50 at a time).

To upload an article to the Topic API we use the following code:

 
    private void uploadArticleToTopicAPI(ArticleEntry x) {
        GsonBuilder builder = new GsonBuilder();
        Gson gson = builder.create();

        try {
            String body = "[" + gson.toJson(x) + "]";

            System.out.println("Sending body for id: " + x.getId());

            HttpResponse<JsonNode> response = Unirest.post("https://dmitrykey-insiderapi-v1.p.mashape.com/articles/uploadJson")
                    .header("X-Mashape-Key", mashapeKey)
                    .header("Content-Type", "application/json")
                    .header("Accept", "application/json")
                    .body(body)
                    .asJson();

            System.out.println("TopicAPI response:" + response.getBody().toString());
        } catch (UnirestException e) {
            log.error("Error: {}", e);
            System.err.println("Error: " + e.getMessage());
        }
    }

Remember that you need to obtain the mashapeKey by subscribing to the API and checking the documentation, where you will find the key already pre-inserted: https://market.mashape.com/dmitrykey/topicapi.

After the upload is complete, the articles end up in a search engine on the backend of the Topic API. You can start triggering the search requests and getting back nice themes. In this example below I have uploaded about 10,000 Russian texts and gotten topics:

Что Говорят # What they say
Большие деньги # Big money
Беларуский Бренд # Belorussian brand
Для Ребенка # For a kid
Как Выглядит работа # How a job looks like
Как Заработать # How to earn money
В Минском Масс-маркете # In a grocery store of Minsk
Женщины # Women

Halloween launches / Релизы на Хеллоуин

We are pleased to report new features in the Topic API — system for realtime topic clustering of documents: articles, blog posts, tweets.

Mashape

We have improved the algorithm to avoid situations where a preposition would be omitted — helps a lot for Russian.

We have added a possibility to filter clusters by time range: use params startDate and endDate in the format yyyy-MM-dd. This feature should allow you to build trends over time!

We made the /articles/cluster end point to support GET — more sensible when integrating with frontends

Enjoy and Happy Halloween!

Мы рады сообщить о новых фичах в Topic API — системе для построения тематических кластеров документов: статей, блог-постов, твитов.

Мы улучшили алгоритм таким образом, чтобы названия кластеров включали предлоги. Например, раньше кластер мог называется “Питере”, теперь он будет называться “В Питере” (конечно, в зависимости от ваших данных).

Теперь кластера можно строить для конкретного промежутка времени: используйте параметры startDate и endDate в формате yyyy-MM-dd. Теперь вы можете строить тренды!

Энд-пойнт /articles/cluster теперь доступен по GET — это более дружественный тип запроса для интеграции с фронтэндами.

Удачи и классного Хэллоуина!

Insider team / команда Insider