Here is the first post in our Semantics to People blog. Have you ever thought of Google or Yahoo! or Bing or Yandex or my_favourite_lovely_searching_system to be problem solvers? Giving back not 1,000,000+ www links (hits) for a query you type, but a more specific answer? Or even a single www link, which you find on 10th page of search results after query tweaking?
So, what is happening behind their scenes once we have typed a searching query into a text box? Well, roughly, what happens is this. Take you query. Break it down to words. Have the words matched against all indexed documents (web pages, doc files, pdf files etc). Give more weight to terms (which are narrow in meaning and topic), give less weight to function words (like prepositions, definite/indefinite articles, punctuation,… ). Get a huge list of documents returned. Rank that list according to relevance and there you go.
Has it something to do with solving the actual task? Users of a searching system are usually trying to solve some task on the web: search a cooking recipe, study a specific topic, find out how to spend a vacation in a country of interest. What would happen if you type “Who was the head of Russia after Gorbachev?”
In order to level up the searching experience or enhance the web presence of users, we should go for the semantic analysis. We believe it is time. tf*idf has shown its potential in the past, is showing its ability to optimally process and return relevant hits today. Stemming and stop words removal were cool in the past; are part of existing systems dealing with information processing currently. But today, the more and more clear necessities of life send us new challenges. How to build a more intelligent system? How to create a knowledge base? How to do a machine translation smarter and transparent for its creators?
So we — a small and slick group of developers and researchers — go for the computer aided semantic analysis. We live it. We are enjoying the theory and practice behind it. We are passionate about the products on top of the semantic analysis. And we are open to new ideas — both for products and challenging research. If you know Russian, have a look at our demos. We have two main branches at the moment available for public, while many others are in their active development cycles.
The Flash demo of semantic analysis of a sentence in Russian can be found first on the top of the demo page. Type in a Russian sentence with some interesting semantics and see what happens. You should have Flash plugin installed in your browser. We have tested the demo with Firefox 3+ and Opera. Play with the sentence tree, pan it as in Adobe Acrobat Reader if necessary, zoom in, zoom out, change the tree’s degree to display (only top level nodes, some part of nodes or all the nodes) by adjusting the “Degrees of Separation” slider. This library has made the demo possible.
The morphological analysis demo shows the potential of our in-house morphological analyzer, which by the way is capable of solving the direct task as well: using a base form of a word find its surface forms. It uses some tricky algorithm for guessing the surface forms for new words, like Nokia (Нокия). We might post something about the tricks later on when we launch this feature as a separate demo. Stay tuned.
If you think you have a challenging task related to NLP in Russian language, get involved and send us your thoughts. We are more than glad to have you speaking up and having our team busy.
More on this later. Enjoy!