Tag Archives: code snippet

Sample code for grouping articles into themes

In this post we would like to share with you Java code snippets, that allow for loading data into our Topic API. The idea of the topic API is that it allows you to group your articles / posts / tweets / documents into topical themes and also search in the content.

In order to navigate oceans of textual data and extract useful structures from your content lakes search is one of the most common way to empower your journey. But once you have found thousands and thousands of matches, you still have the problem of the data overload. Topical grouping can help.

All code in this post you can find in our public GitHub repository. We further assume, that your article content is stored in a MySQL database. Using mybatis we load the articles with https://github.com/semanticanalyzer/nlproc_sdk_sample_code/blob/master/src/main/resources/mappers/ArticleEntryMapper.xml.

TopicLoader class takes care of doing it all: loading articles from the DB, forming a JSON request to the Topic API and uploading the relevant fields.

This main class takes single command line argument: resource_id, which matches onto the field of articles DB table in https://github.com/semanticanalyzer/nlproc_sdk_sample_code/blob/master/src/main/resources/mappers/ArticleEntryMapper.xml.

The method uploadDBEntries will load the articles entries from the DB and upload to the Topic API one by one. Note, that if your content is not very large in size (tweet size), then you can upload several posts in one single request (up to 50 at a time).

To upload an article to the Topic API we use the following code:

 
    private void uploadArticleToTopicAPI(ArticleEntry x) {
        GsonBuilder builder = new GsonBuilder();
        Gson gson = builder.create();

        try {
            String body = "[" + gson.toJson(x) + "]";

            System.out.println("Sending body for id: " + x.getId());

            HttpResponse<JsonNode> response = Unirest.post("https://dmitrykey-insiderapi-v1.p.mashape.com/articles/uploadJson")
                    .header("X-Mashape-Key", mashapeKey)
                    .header("Content-Type", "application/json")
                    .header("Accept", "application/json")
                    .body(body)
                    .asJson();

            System.out.println("TopicAPI response:" + response.getBody().toString());
        } catch (UnirestException e) {
            log.error("Error: {}", e);
            System.err.println("Error: " + e.getMessage());
        }
    }

Remember that you need to obtain the mashapeKey by subscribing to the API and checking the documentation, where you will find the key already pre-inserted: https://market.mashape.com/dmitrykey/topicapi.

After the upload is complete, the articles end up in a search engine on the backend of the Topic API. You can start triggering the search requests and getting back nice themes. In this example below I have uploaded about 10,000 Russian texts and gotten topics:

Что Говорят # What they say
Большие деньги # Big money
Беларуский Бренд # Belorussian brand
Для Ребенка # For a kid
Как Выглядит работа # How a job looks like
Как Заработать # How to earn money
В Минском Масс-маркете # In a grocery store of Minsk
Женщины # Women