Last month Insider has contributed to common research project with two other companies: ContextMedia (with 20+ years of traditional media analytics) and YouControl (with access to government data). Target of the research was to build a bio and semantic portrait of the Ukrainian politician Dmytro Svyatash in light of the law on car import in Ukraine. The interactive research results can be found here (in Russian).
Insider has used two own tools for unstructured text analytics: Insider API for realtime semantic topic creation (screenshots and description of the system are here) and RSA API for entity level sentiment analysis.
The resulting system, that was prototyped in under a week, allowed for:
- Navigating through years of data from 2002 to current moment using keyword searches.
- Understanding the sentiment distribution in the found corpora and for given search.
- Researching quantitative search trends using visual trend chart.
- Sifting through the produced semantic topics, grouping various news items together in search results.
- Getting the heart beat of twitter.
In the process we relied on best open source tools, including Apache Tika, using which allowed us to swiftly convert HTML news articles into JSON format, preserving all important attributes of a news item: title, contents. We crafted and applied additionally own NER for extracting date of a publication to properly place it on the time scale.
Want to do a similar research on your own data? Get in touch: [email protected].