More languages to the mix

Dear users!

We are super thrilled to announce the support of virtually any language in TopicAPI. Upload and gets topics for your articles in German or Russian. We plan to add more languages in the coming weeks. Please let us know what language you would like the most: https://goo.gl/forms/5RtE8ywHY2GYrTYH2

Notice, that we have updated the documentation to reflect the multi-language feature: you need to supply a language code in the URL path to make sure your content is properly processed. See more here: https://market.mashape.com/dmitrykey/topicapi

Happy coming holidays and many insightful topics in your content!

Insider

Mashape

Мы гипер рады сообщить о том, что теперь TopicAPI поддерживает практически любой язык. Сейчас это русский и немецкий. Пожалуйста, дайте нам знать, какой язык вы хотели бы видеть в системе в первую очередь: https://goo.gl/forms/5RtE8ywHY2GYrTYH2

Обращаем ваше внимание, что мы изменили документацию к API: https://market.mashape.com/dmitrykey/topicapi

Счастливых праздников!

Insider

Sample code for grouping articles into themes

In this post we would like to share with you Java code snippets, that allow for loading data into our Topic API. The idea of the topic API is that it allows you to group your articles / posts / tweets / documents into topical themes and also search in the content.

In order to navigate oceans of textual data and extract useful structures from your content lakes search is one of the most common way to empower your journey. But once you have found thousands and thousands of matches, you still have the problem of the data overload. Topical grouping can help.

All code in this post you can find in our public GitHub repository. We further assume, that your article content is stored in a MySQL database. Using mybatis we load the articles with https://github.com/semanticanalyzer/nlproc_sdk_sample_code/blob/master/src/main/resources/mappers/ArticleEntryMapper.xml.

TopicLoader class takes care of doing it all: loading articles from the DB, forming a JSON request to the Topic API and uploading the relevant fields.

This main class takes single command line argument: resource_id, which matches onto the field of articles DB table in https://github.com/semanticanalyzer/nlproc_sdk_sample_code/blob/master/src/main/resources/mappers/ArticleEntryMapper.xml.

The method uploadDBEntries will load the articles entries from the DB and upload to the Topic API one by one. Note, that if your content is not very large in size (tweet size), then you can upload several posts in one single request (up to 50 at a time).

To upload an article to the Topic API we use the following code:

 
    private void uploadArticleToTopicAPI(ArticleEntry x) {
        GsonBuilder builder = new GsonBuilder();
        Gson gson = builder.create();

        try {
            String body = "[" + gson.toJson(x) + "]";

            System.out.println("Sending body for id: " + x.getId());

            HttpResponse<JsonNode> response = Unirest.post("https://dmitrykey-insiderapi-v1.p.mashape.com/articles/uploadJson")
                    .header("X-Mashape-Key", mashapeKey)
                    .header("Content-Type", "application/json")
                    .header("Accept", "application/json")
                    .body(body)
                    .asJson();

            System.out.println("TopicAPI response:" + response.getBody().toString());
        } catch (UnirestException e) {
            log.error("Error: {}", e);
            System.err.println("Error: " + e.getMessage());
        }
    }

Remember that you need to obtain the mashapeKey by subscribing to the API and checking the documentation, where you will find the key already pre-inserted: https://market.mashape.com/dmitrykey/topicapi.

After the upload is complete, the articles end up in a search engine on the backend of the Topic API. You can start triggering the search requests and getting back nice themes. In this example below I have uploaded about 10,000 Russian texts and gotten topics:

Что Говорят # What they say
Большие деньги # Big money
Беларуский Бренд # Belorussian brand
Для Ребенка # For a kid
Как Выглядит работа # How a job looks like
Как Заработать # How to earn money
В Минском Масс-маркете # In a grocery store of Minsk
Женщины # Women

Halloween launches / Релизы на Хеллоуин

We are pleased to report new features in the Topic API — system for realtime topic clustering of documents: articles, blog posts, tweets.

Mashape

We have improved the algorithm to avoid situations where a preposition would be omitted — helps a lot for Russian.

We have added a possibility to filter clusters by time range: use params startDate and endDate in the format yyyy-MM-dd. This feature should allow you to build trends over time!

We made the /articles/cluster end point to support GET — more sensible when integrating with frontends

Enjoy and Happy Halloween!

Мы рады сообщить о новых фичах в Topic API — системе для построения тематических кластеров документов: статей, блог-постов, твитов.

Мы улучшили алгоритм таким образом, чтобы названия кластеров включали предлоги. Например, раньше кластер мог называется “Питере”, теперь он будет называться “В Питере” (конечно, в зависимости от ваших данных).

Теперь кластера можно строить для конкретного промежутка времени: используйте параметры startDate и endDate в формате yyyy-MM-dd. Теперь вы можете строить тренды!

Энд-пойнт /articles/cluster теперь доступен по GET — это более дружественный тип запроса для интеграции с фронтэндами.

Удачи и классного Хэллоуина!

Insider team / команда Insider

Text analytics APIs: simplified pricing

We focus a lot on unifying access to our text analytics APIs. One of such areas is pricing. We obviously want more users to have access to our systems at meaningful prices.

In the course of the last month we have unified and decreased prices for all our APIs. Here are the changes:

RSA API (entity level sentiment detection for Russian):
Overage fee for Basic plan is USD $0,02 (was: USD $0,05). This matches the overage fee on all other plans.
PRO plan is now USD $99 instead of $299.
ULTRA plan is now USD $199 instead of $350.

FUXI API (sentiment detection for Chinese):
We changed our subscription plan Basic to allow for 15,000 texts a month for just $10 instead of 500 / day.
PRO plan allows you to 100,000 texts a month for $99.

Topic API (searchable topics for texts in Russian)
Basic plan allows for sending 1,000 messages for $19. Remember, that one message can contain up to 50 texts. If you were only uploading texts you could upload 50,000 of them.

The following APIs continue to be FREE:

ConnectedWords (find semantically similar English words to the ones given)
SemanticCloud (frequency word clouds for Russian along with lemmas)

Our team is always listening to you, our users — let us know, what APIs you would like to have in addition, what features to existing APIs and what volumes you would like to handle.

Enjoy the journey of extracting signal from your textual lakes!

AI education. What market requires?

When you start looking at the field of AI (Artificial Intelligence) as a business leader or software developer you can get lost at first.

In this online seminar between Carine Simon, MIT (Boston, USA), Borys Pratsyuk, Ciklum, Valeria Zabolotna, UNIT.City (Kiev, Uktrain) and Dmitry Kan, Insider (Helsinki, Finland) you will learn:

  1. What formal AI education programs exist at MIT
  2. What industry expects of hires for AI role
  3. How to get started with AI as a practitioner — frameworks, hardware, communities

Seminar host: Misha Feldman

Hope you will enjoy the video and do let us know, if it was helpful for you!

Making sense: The API is FREE. Use it today and give us your feedback!

What if you have a bunch of or just one long text in any language and none of summarisation tools work for you?

If you have been in such a situation and also you have texts from social media or news sources that you cannot always trust in terms of how clean of noise they are, do they have URLs, hashtags, people names, addresses and so on.

And if you wanted to sift through the texts with some filter, like I want only nouns and only verbs to capture who did what.
Or I want only adjectives and nouns to capture with what colour do texts describe arbitrary objects.
Or I just want to have links out of all texts.

Now you can do all of that with one call to our SemanticCloud API.

Let’s pick the following tweet with hashtags and an URL in it and a mix of two languages: Russian and English:
Я голосую за сильного президента, за сильную независимую Россию и за тех, кто привык спрашивать только с себя, а не винить в своей лени остальных! I vote for a strong president, for a strong Russia! #выборыпрезидента #RussiaElections2018 #ЯГолосую #ЗаПутина #Putin http://pic.twitter.com/zkY8axHqZA

And let’s ask the system to output nouns, verbs, adverbs, adjectives, names, hyperlinks.

Two top words by count are: strong and сильный (translation pair)

{
“word”: “strong”,
“stem”: “strong”,
“partOfSpeech”: “Unknown”,
“count”: 2,
“lemma”: false,
“keyword”: false
},

{
“word”: “сильный”,
“stem”: “сильный”,
“partOfSpeech”: “Adjective”,
“count”: 2,
“lemma”: true,
“keyword”: false
}

But we also parsed the words out of hashtags:

{
“word”: “яголосую”,
“stem”: “яголос”,
“partOfSpeech”: “Unknown”,
“count”: 1,
“lemma”: false,
“keyword”: false
},

{
“word”: “выборыпрезидента”,
“stem”: “выборыпрезидент”,
“partOfSpeech”: “Unknown”,
“count”: 1,
“lemma”: false,
“keyword”: false
}

and a URL:

{
“word”: “http://pic.twitter.com/zky8axhqza”,
“stem”: “http://pic.twitter.com/zky8axhqza”,
“partOfSpeech”: “Hyperlink”,
“count”: 1,
“lemma”: false,
“keyword”: false
}

In addition we can ask the API to give us only top N words (by frequencies) along with lemmas (where applicable). And, more importantly, we can ask to count our secret word, that we are monitoring. Whether or not our secret word is present in the texts, it will be returned back:

{
“word”: “петербург”,
“stem”: “петербург”,
“partOfSpeech”: “Noun”,
“count”: 0,
“lemma”: true,
“keyword”: true
}

The API is FREE. Use it today and give us your feedback!

Mashape

Insider team

Feedback API to improve sentiment detection algorithm

We are pleased to announce the addition of a new feature in RussianSentimentAnalyzer API: feedback endpoint. Using the endpoint you can provide correct sentiment label for an earlier submitted text, if you disagree with the API’s label. With this information we will automatically adjust performance of the sentiment prediction after accumulating enough of ‘text,correct label’ pairs.

So from now on you can train the algorithm behind the RussianSentimentAnalyzer API!

Did you have a chance to visit our brand new web-site? Please do visit and let us know, what you think! https://semanticanalyzer.info/

Insider team