Tag Archives: api

Fuxi API 1.2 for Chinese sentiment analysis is here

Analyzing Sina Weibo (Chinese Twitter) and Renren (Chinese Facebook) for sentiment are quite tricky. In general social media analysis, for instance for Russian is tricky. There are few reasons:

  1. Grammar: in short messages there is not much space to spell out correct grammar. So in most cases it is “broken” from the stand point of classic parsers.
  2. Words: they change frequently, following social media development of a particular news / reaction or may be even a flash mob.
  3. Sarcasm: the author does not mean the sentiment you deduce by reading it for the first time. It sometimes takes a research and find a visual item, that helps understand the sentiment:     

Fuxi API is catching up with what’s cooking in Chinese social media by analyzing a vast array of messages in Simplified and Traditional Chinese. We have just released its 1.2 version with a number of changes to better tune for the sentiment signal in the avalanche of tweets, blog posts and news articles, all in Chinese. Check it out.

Annotating sentiment with RussianSentimentAnalyzer API in Java

Hello!

In this post we will show how easy it is to start using RussianSentimentAnalyzer API on mashape from your Java code.

package com.semanticanalyzer;

import com.mashape.unirest.http.HttpResponse;
import com.mashape.unirest.http.JsonNode;
import com.mashape.unirest.http.Unirest;
import com.mashape.unirest.http.exceptions.UnirestException;

public class RussianSentimentAnalyzerMashapeClient {

    private final static String mashapeKey = "[PUT_YOUR_MASHAPE_KEY_HERE]";

    public static void main(String[] args) throws UnirestException {

        String textToAnnotate = "'ВТБ кстати неплохой банк)'";
        String targetObject = "'ВТБ'";

        // These code snippets use an open-source library. http://unirest.io/java
        HttpResponse response = Unirest.post("https://russiansentimentanalyzer.p.mashape.com/rsa/sentiment/polarity/json/")
                .header("X-Mashape-Key", mashapeKey)
                .header("Content-Type", "application/json")
                .header("Accept", "application/json")
                .body("{'text':" + textToAnnotate + ",'object_keywords':" + targetObject + ",'output_format':'json'}")
                .asJson();

        System.out.println("Input text = " + textToAnnotate + "\n" + "Target object:" + targetObject);
        System.out.println("RussianSentimentAnalyzer response:" + response.getBody().toString());
    }
}

In the code snippet above we’ve used the mashape’s Unirest API, that makes HTTP requesting in Java super easy.

All you really need to care about is to register at mashape.com, sign up for RussianSentimentAnalyzer API and insert your unique mashape key into the code, in place of “PUT_YOUR_MASHAPE_KEY_HERE”, as a value of the mashapeKey variable.

If everything has been set right, execute the code and you should see the following output:

Input text = 'ВТБ кстати неплохой банк)'
Target object:'ВТБ'
RussianSentimentAnalyzer response:{"sentiment":"POSITIVE","synonyms":"[ВТБ]"}

Now you can easily hook the API up into your cool Java app and annotate texts in Russian for sentiment!

You’ll find the code on our github here: https://github.com/semanticanalyzer/nlproc_sdk_sample_code

Keep calm and use an API

Sentiment detection for English: cheaper prices for even more benefit

At SemanticAnalyzer we believe natural language processing APIs should become a commodity. In a good sense. Every developer should be able to afford integrating AI into their cool mobile and web applications.

So we decided to substantially lower the prices for our English sentiment detection API SentiFindr. New prices you will find here:

https://www.mashape.com/dmitrykey/sentifindr/pricing

We always welcome your feedback. Integrate now for free and tell us what you think! Just raise a ticket anytime: https://www.mashape.com/dmitrykey/sentifindr/support

 

… and: Keep calm and use an API;

Bridge in Helsinki

 

Russian Sentiment Analyzer API: pricing

We have just published the Russian Sentiment Analyzer API on mashape!

The pricing is pretty straightforward, feel free to give your feedback or request a custom plan.

RussianSentimentAnalyzerPricing

 

You will need to register with mashape in order to start consuming the API.

To get started, click this little button:

RussianSentimentAnalyzer API

JSON API анализа тональности на русском языке

На основе технологического стека SemanticAnalyzer мы запустили API анализа тональности на русском языке. Это json API, принимающий следующую структуру:

{
 "text":"some_text_in_utf-8",
 "object_keywords":"object_keywords_in_csv_in_utf-8",
 "output_format":"json or xml"
}

API синхронно выдаёт json либо xml со структурой:

json:

{
 "sentiment": "${sentimentTag}",
 "synonyms": "${synonyms}"
}

xml:

<!--?xml version="1.0" encoding="utf-8"?-->
 
  ${sentimentTag}
  ${synonyms}
 

Пример с реальным текстом:

{
 "text":"Самарские пиармены помогут уральскому самородку:
    Засекин.Ру – самарские новости и мнения экспертов #ИгорьХолманских",
 "object_keywords":"ИгорьХолманских,Игорь Холманских",
 "output_format":"json"
}

Ответ системы:

{
 "sentiment": "POSITIVE",
 "synonyms": "[ИгорьХолманских]"
}

Ответ содержит метку тональности и объект, по отношению к которому она была вычислена.

Также системой поддерживаются POST запросы со стандартным набором параметров. В этом случае в тело POST запроса передаётся urlencoded key=value строка в http формате:

text=my_text&amp;object_keywords=keyword1,keyword2,keyword3&amp;output_format={json}.

К API прилагается документация, а также примеры интеграции на Java, Node, PHP, Python, Objective-C, Ruby и .NET.

Получить доступ к API:

RussianSentimentAnalyzer API

Lemmatizer / Stemmer for Russian: how to use in your code

This post will guide you through the usage of the Lemmatizer library for the Russian language, that can be ordered through sending a request at [email protected]

First off, what is lemmatizer? When you have lots of data in morphologically rich languages (i.e. natural languages with a lot of variation per word, that is expressed through the word endings / prefixes), you usually would like to find out the base form of a word, also called lemma (hence, lemmatizer). Along with that, you try to resolve the Part of speech (POS), i.e. whether the word is a noun, verb, adjective or something else. Once you found out both the base form and a POS tag you can store that in your databases for further processing. Let say, you system is a search engine over texts in Russian. In order to increase the recall of your search engine you would like to maximize the document coverage of a user query, no matter in what word forms has the query been formulated. Let’s imagine the user query is:

рестораны Москвы

(restaurants of Moscow)

The first word is in plural of ресторан (restaurant) and the second word is genetive of Москва (Moscow). Let’s run both words through the lemmatizer API:

    
import info.semanticanalyzer.morph.ru.MorphAnalyzer;
import info.semanticanalyzer.morph.ru.MorphAnalyzerConfig;
import info.semanticanalyzer.morph.ru.MorphAnalyzerLoader;
import info.semanticanalyzer.morph.ru.PartOfSpeech;
import info.semanticanalyzer.tok.GenericFlexTokenizer;
import info.semanticanalyzer.tok.Token;
import info.semanticanalyzer.tok.Tokenizer;
import info.semanticanalyzer.util.Charsets;
import info.semanticanalyzer.util.IOUtils;
import info.semanticanalyzer.morph.ru.MorphDesc;

import java.io.File;
import java.io.IOException;
import java.io.StringReader;
import java.util.List;
import java.util.Properties;

public class LemmatizerRuTest {

    public void testBlogPostExample() throws RuntimeException {
        File propeFile = new File("conf/lemmatizer-ru.properties");
        Properties properties = new Properties();
        properties.load(new StringReader(IOUtils.readFile(propeFile, Charsets.UTF_8)));
        MorphAnalyzer analyzer = MorphAnalyzerLoader.load(new MorphAnalyzerConfig(properties));

        String phrase = "рестораны Москвы";

        Tokenizer tokenizer = new GenericFlexTokenizer(new StringReader(phrase.toLowerCase()), true);
        Token reusableToken = Token.newReusableToken();
        try {
    		while ( (reusableToken = tokenizer.getNextToken(reusableToken)) != null ) {
    			String token = reusableToken.getText();
    			MorphDesc morphDescription = analyzer.analyzeBest(token);
    			if (morphDescription != null &amp;&amp; morphDescription.getLemma() != null) {
                                info("Most frequent lemma of '" + token + "' is " + morphDescription.getLemma());
                                info("Its POS tag: " + morphDescription.getPos());
    			}
    		}
    	} catch (IOException e) {
    		throw new RuntimeException("testBlogPostExample failed: " + e.getMessage());
    	}
    }

    private void info(String msg) {
    	System.out.println("INFO " + msg);
    }
}

The code above takes the original user query and tokenizes it using the GenericFlexTokenizer, that suits generic Russian texts and is part of the lemmatizer package. If you are more into mass media processing, then there is TwitterTokenizer at your sevice. Then in the while loop each token is analyzed and most frequent lemma and its POS tag are extracted and printed onto standard output (console). The frequency is based on lemma’s weight that is encoded in the lemmatizer’s dictionary. If, however, you don’t want the most frequent lemma, you could list all of lemma candidates via calling method analyzer.analyze(). The code produces the following output:

INFO Most frequent lemma of 'рестораны' is ресторан
INFO Its POS tag: NOUN
INFO Most frequent lemma of 'москвы' is москва
INFO Its POS tag: NOUN

Now having both base forms “ресторан” and “москва” you can search over your documents and find hits like: лучший ресторан в Москве (best restaurant in Moscow), самый уютный ресторан Москвы (the coziest restaurant of Moscow). You could also expand the original words into synonyms and match documents using another condition: POS tag. This would bring you results with more hits, but constrained on part of speech of the original user query that should increase the precision of your search.