Tag Archives: source code

Finding sentiment in Ruby

Dialogue is the largest conference on computational linguistics in Russia. Historically, it has been supported by Abbyy, Yandex, Moscow State University as well as the Higher School of Economics and the Moscow Institute of Physics and Technology. This year, as part of the conference the sentiment analysis track is held. In this post we will show the training / test formats of tweets and illustrate how they can be analyzed with our RSA API in ruby.

The code in this post is using mashape key token that can be obtained by registering a user account on http://market.mashape.com/. After registering, signup for the freemium plan of the RSA API. Then you will have a token that is uniquely identifying your access to this exact API under this exact subscription plan.


The training and test data provided by the sentiment track organizers is the following, illustrating with a single tweet text: “Отказ от повышения налогов сохранит и даже ускорит рост ВВП РФ – Sberbank CIB.”

      <table name="bank_train_2016">
            <column name="id">70</column>
            <column name="twitid">492546512652500000</column>
            <column name="date">1406267214</column>
            <column name="text">Отказ от повышения налогов сохранит и даже ускорит рост ВВП РФ - Sberbank CIB</column>
            <column name="sberbank">1</column>
            <column name="vtb">NULL</column>
            <column name="gazprom">NULL</column>
            <column name="alfabank">NULL</column>
            <column name="bankmoskvy">NULL</column>
            <column name="raiffeisen">NULL</column>
            <column name="uralsib">NULL</column>
            <column name="rshb">NULL</column>

The task is to analyze for sentiment entries in xml tags with name “text” and update target bank name entity with -1 (NEGATIVE), 0 (NEUTRAL) or 1 (POSITIVE) flag.

The following is the code that reads an xml file from the first command line parameter, type of entities from the second parameter (banks or telecom) and updates the input file with automatically calculated sentiment values using the RSA API.

require 'rubygems' 
require 'nokogiri'
require 'unirest'

if ARGV.length < 2
    puts "Need xml file as input and type of entities: banks or telecom"

supported_entities = ['banks', 'telecom']
supported_entities_telecom = ['beeline', 'mts', 'megafon', 'tele2', 'rostelecom', 'komstar', 'skylink']
supported_entities_banks   = ['sberbank', 'vtb', 'gazprom', 'alfabank', 'bankmoskvy', 'raiffeisen', 'uralsib', 'rshb']

entities_type = ARGV[1]
if not supported_entities.include?(entities_type)
  puts "Unsupported entities type requested. Supported once are: " + supported_entities.to_s

if entities_type == 'banks'
  target_entities = supported_entities_banks
elsif entities_type == 'telecom'
  target_entities = supported_entities_telecom
  puts "FATAL ERROR: request unsupported entities type: " + entities_type

def get_sentiment(text)
  # These code snippets use an open-source library.
  response = Unirest.post "https://russiansentimentanalyzer.p.mashape.com/rsa/sentiment/polarity/json/",
    "X-Mashape-Key" => "[INSERT_TOKEN_HERE]",
    "Content-Type" => "application/json",
    "Accept" => "text/plain"
    parameters: { :text => text, :object_keywords => "", :output_format => "" }.to_json

  puts "get_sentiment, text=" + text + " SENTIMENT=" + response.body["sentiment"]

  if response.body['sentiment'] == "POSITIVE"
    return 1
  elsif response.body['sentiment'] == "NEGATIVE"
    return -1
    return 0

file_name = ARGV[0]
@doc = Nokogiri::XML(File.open(file_name))
columns = @doc.xpath("//database/table/column")
sentiment_tag = -2
columns.each { |column| 
     if column['name'] == 'text'
         sentiment_tag = get_sentiment(column.content)
     if target_entities.include?(column['name'])
         if sentiment_tag > -2 and column.content != 'NULL'
           puts "updating " +  column['name'] + " with " + sentiment_tag.to_s
           column.content = sentiment_tag
           sentiment_tag = -2

File.open(file_name, 'w') {|f| f.write(@doc) }