Category Archives: Release

Russian topics quality improved

Hello our dear users,

We are happy to announce the improvements in the quality of topics produced for texts in Russian.

The trick with Russian and similar languages (Ukrainian, Finnish, Belorussian, Polish) is rich morphology: lots of word endings (surface forms), grammatical cases will lead to abrupt topics or topics in a wrong case.

We have implemented additional analysis techniques that will compute base forms when applicable and remove prepositions on the boundaries of topics.

Enjoy and remember that topics are your gateway to categorizing large number of texts and bringing structure to unstructured.

The update is immediately available for all our users.

Consume the Topic API today:

Insider team

More languages to the mix

Dear users!

We are super thrilled to announce the support of virtually any language in TopicAPI. Upload and gets topics for your articles in German or Russian. We plan to add more languages in the coming weeks. Please let us know what language you would like the most:

Notice, that we have updated the documentation to reflect the multi-language feature: you need to supply a language code in the URL path to make sure your content is properly processed. See more here:

Happy coming holidays and many insightful topics in your content!



Мы гипер рады сообщить о том, что теперь TopicAPI поддерживает практически любой язык. Сейчас это русский и немецкий. Пожалуйста, дайте нам знать, какой язык вы хотели бы видеть в системе в первую очередь:

Обращаем ваше внимание, что мы изменили документацию к API:

Счастливых праздников!


Halloween launches / Релизы на Хеллоуин

We are pleased to report new features in the Topic API — system for realtime topic clustering of documents: articles, blog posts, tweets.


We have improved the algorithm to avoid situations where a preposition would be omitted — helps a lot for Russian.

We have added a possibility to filter clusters by time range: use params startDate and endDate in the format yyyy-MM-dd. This feature should allow you to build trends over time!

We made the /articles/cluster end point to support GET — more sensible when integrating with frontends

Enjoy and Happy Halloween!

Мы рады сообщить о новых фичах в Topic API — системе для построения тематических кластеров документов: статей, блог-постов, твитов.

Мы улучшили алгоритм таким образом, чтобы названия кластеров включали предлоги. Например, раньше кластер мог называется “Питере”, теперь он будет называться “В Питере” (конечно, в зависимости от ваших данных).

Теперь кластера можно строить для конкретного промежутка времени: используйте параметры startDate и endDate в формате yyyy-MM-dd. Теперь вы можете строить тренды!

Энд-пойнт /articles/cluster теперь доступен по GET — это более дружественный тип запроса для интеграции с фронтэндами.

Удачи и классного Хэллоуина!

Insider team / команда Insider

Making sense: The API is FREE. Use it today and give us your feedback!

What if you have a bunch of or just one long text in any language and none of summarisation tools work for you?

If you have been in such a situation and also you have texts from social media or news sources that you cannot always trust in terms of how clean of noise they are, do they have URLs, hashtags, people names, addresses and so on.

And if you wanted to sift through the texts with some filter, like I want only nouns and only verbs to capture who did what.
Or I want only adjectives and nouns to capture with what colour do texts describe arbitrary objects.
Or I just want to have links out of all texts.

Now you can do all of that with one call to our SemanticCloud API.

Let’s pick the following tweet with hashtags and an URL in it and a mix of two languages: Russian and English:
Я голосую за сильного президента, за сильную независимую Россию и за тех, кто привык спрашивать только с себя, а не винить в своей лени остальных! I vote for a strong president, for a strong Russia! #выборыпрезидента #RussiaElections2018 #ЯГолосую #ЗаПутина #Putin

And let’s ask the system to output nouns, verbs, adverbs, adjectives, names, hyperlinks.

Two top words by count are: strong and сильный (translation pair)

“word”: “strong”,
“stem”: “strong”,
“partOfSpeech”: “Unknown”,
“count”: 2,
“lemma”: false,
“keyword”: false

“word”: “сильный”,
“stem”: “сильный”,
“partOfSpeech”: “Adjective”,
“count”: 2,
“lemma”: true,
“keyword”: false

But we also parsed the words out of hashtags:

“word”: “яголосую”,
“stem”: “яголос”,
“partOfSpeech”: “Unknown”,
“count”: 1,
“lemma”: false,
“keyword”: false

“word”: “выборыпрезидента”,
“stem”: “выборыпрезидент”,
“partOfSpeech”: “Unknown”,
“count”: 1,
“lemma”: false,
“keyword”: false

and a URL:

“word”: “”,
“stem”: “”,
“partOfSpeech”: “Hyperlink”,
“count”: 1,
“lemma”: false,
“keyword”: false

In addition we can ask the API to give us only top N words (by frequencies) along with lemmas (where applicable). And, more importantly, we can ask to count our secret word, that we are monitoring. Whether or not our secret word is present in the texts, it will be returned back:

“word”: “петербург”,
“stem”: “петербург”,
“partOfSpeech”: “Noun”,
“count”: 0,
“lemma”: true,
“keyword”: true

The API is FREE. Use it today and give us your feedback!


Insider team

Feedback API to improve sentiment detection algorithm

We are pleased to announce the addition of a new feature in RussianSentimentAnalyzer API: feedback endpoint. Using the endpoint you can provide correct sentiment label for an earlier submitted text, if you disagree with the API’s label. With this information we will automatically adjust performance of the sentiment prediction after accumulating enough of ‘text,correct label’ pairs.

So from now on you can train the algorithm behind the RussianSentimentAnalyzer API!

Did you have a chance to visit our brand new web-site? Please do visit and let us know, what you think!

Insider team

NEW API: ConnectedWords

Hello and Happy New Year!

New Year – New API. We have launched new API called ConnectedWords. We have trained a neural network using word2vec approach on a number of English texts. As input you can supply an array of keywords for which you’d like to get another list of connected or related words.


Available end-points:

Here is an example:

For word “launch” the API produces the following connected words:

“launched 0.5948931514907372”,
“ariane 0.5640206606244647”,
“icbm 0.532163213444619”,
“canaveral 0.5222400316699805”,
“rocket 0.5168188279637889”,
“launcher 0.5066764146199603”,
“suborbital 0.4987842348018603”,
“landing 0.49743730683360354”,
“expendable 0.49456818497947097”,
“agena 0.49325088465809586”,
“orbiter 0.4930563861239534”,
“shuttle 0.48127536803463045”,
“unmanned 0.47977178154360445”,
“launches 0.47013505662020805”,
“sputnik 0.4690193780888272”,
“bomarc 0.46608954818339043”,
“mission 0.4622460565342408”,
“redstone 0.4509777243147255”,
“gliders 0.4493604525398496”,
“missile 0.4388378398880377”,
“abort 0.4322835796211848”,
“rockets 0.4255249811253634”,
“lgm 0.42401975940492775”,
“launching 0.42055305756491634”,
“spacecraft 0.42044358977136653”,
“warhead 0.4203600640856848”,
“manned 0.4196165464952628”,
“skylab 0.417352627778655”,
“spaceflight 0.41261142646271765”,
“payloads 0.41167406251520333”,
“operational 0.41030200304930986”,
“refueling 0.41015588246409607”,
“orbit 0.4054650313323691”,
“extravehicular 0.4040691414909361”,
“icbms 0.4037563327101452”,
“hotol 0.4027989227897706”,
“sts 0.400049473907643”,
“saturn 0.399919637824496”,
“payload 0.398525218766963”,
“bm 0.3965859062493564”

How can one use the API?

1. Making your search engine smarter: expand the result set to documents containing related words. This helps you solve the issue of zero hit searches.

2. Spice up your writing. Are you a journalist / blogger / student and would like to add a flavour to your text? Send in a few words and get a set of words, that might help make your texts more interesting and engaging.

In the future we would like to add support for other languages and train on different types of texts, like social media, news, blogs etc. If you have more ideas for how to make the system more useful for your needs, get in touch!


Fuxi API: Normalized sentiment strength (release 1.5.2)

We are pleased to announce the release 1.5.2 of Fuxi API for Chinese sentiment analysis. In this release we have bounded the sentiment strength (previously unbounded integer value) into a range [-1, 0, 1]. The value is a floating number and is normalized.

Hope you enjoy using the API & let us know any feedback / suggestions you might have!

Insider team