Tag Archives: open source

ApacheCon North America 2015 and supporting open source

At SemanticAnalyzer we use a number of open source tools and systems to build solutions for our clients. It has been our pleasure to be present at the recently held in Austin, Texas ApacheCon. We have met a bunch of new open source developers as well as folks behind well-known companies, such as Pivotal, GoogleA9, Cloudera, SkyMind, LinkedIn, Microsoft, IBM, NASA, RedHat. The list is endless.

The weather was warm and city welcoming.

Austin downtown

Austin downtown

Most of the time went to the various engineering and scientific talks on a vast array of all topics open source & Apache. Beside, it was 20th birthday of the Apache HTTP Server!

Apache HTTP Server 20th birthday

Roman Shaposhnik of Pivotal gave a talk on the Apache Incubator: Where It Is Coming From and Where It Is Going. Apache Incubator is the place, where many big Apache projects appear and graduate and.. some don’t.

Roman Shaposhnik on Apache Incubator

Roman Shaposhnik on Apache Incubator

But that’s fine, because we are all after quality of code, community around it and adoption by the companies, organizations and individuals. The talk focused on how the project gets accepted to the Incubator, how it is assessed during the incubation, what are the formal grading / reporting. And project chickens and pigs. Check the slides to see what that means :)

Shane Curcuru has given guidance on very important topic of How to Keep Your Apache Project’s Independence. I think this topic is equally important for any open source project, would you agree, Shane? Trademarks and branding, keeping your project in shape by actively seeking new contributors for long hanging bugs, dealing with difficult parties (corporations), legal action, talking to outside laywers and so much more in Shane’s very intriguing presentation, that uncovers the internal kitchen of running an Apache project from the product point of view.

Radek Maciaszek of DataMine Lab talked about streaming data and dealing with it in R and Apache Storm. Streaming data arises in problems like fraud detection, online advertising and network traffic generally. Usually, when you deal with data, the foremost important target is to study the properties of your data, find outliers, mass of values and noise. The interesting part of Radek’s presentation was beta distributions that are especially useful for analyzing the patterns of the streaming data. Prototyping with R and Storm looked rather easy and the recommended package to use is: hdp://cran.r-­‐project.org/web/packages/Storm.

IBM Watson researchers and engineers have reserved few slots, where our CEO Dmitry Kan has been moderating. Modeling through concepts, semantics and search is so close to what we do at SemanticAnalyzer, that we were especially motivated to participate these presentations. It turned out, that early versions of IBM Watson would run for dozens of minutes and sometimes dozens of hours. This was not at all acceptable for realtime nature of the Jeopardy game. And so it had to be put to scale. UIMA DUCC has been employed for the processing pipeline, since Apache UIMA has been used already for data semantic enriching and reasoning. Here is the live demo of the UIMA DUCC: http://uima-ducc-demo.apache.org:42133/jobs.jsp

Andriy Redko shared really practical fu for embedding search capabilities into your web app. This especially useful, if you’ve built some API and would like to provide search capabilities in the processed data. All this can be achieved with no sweat using the Apache CXF and Lucene. The demo has been impressive with realtime pdf to text conversion with TIka, indexing with Lucene and searching in a friendly UI. Check it out.

Chris Mattman of NASA has given a very dynamic breadth-first talk about various Apache projects that deal with data extraction / analysis. In his talk “If You Have The Content, Then Apache Has The Technology!” Chris has walked the audience through various projects Apache has to offer for your content: be it data extraction, data representation (like triples), data mining / machine learning etc. They were looked at from very practical view point: how quickly can you build and jump in to use them for your task at hand. Some projects did receive thumbs up from the Apache Tika creator! Some could use help on improving. Very practical and useful. And, thanks a lot Chris for the pull request on luke you have sent prior to the conference!

The new rocking open source technology, currently incubating at Apache, is Apache Ignite. Dmitriy Setryakyan of GridGain has presented on it in a very informative and structured fashion. Apache Ignite is essentially an in-memory data fabric, than according to Dmitriy runs as fast on the virtualized servers (read AWS) as it runs on the bare metal. The same technology was also presented on a keynote by Nikita Ivanov, CTO at GridGain.

IMG_20150414_175131

There has been a lot more packed into 7 (!) parallel session tracks, than can be possibly covered here. Go check the talk schedule yourself and enjoy the videos / slides.

The culmination of the 3 day conference was in a form of the 5 minute lightning talks hosted by Jim Jagielski and Shane Curcuru, warmed up by the great beer. Despite the beer, it was not that _easy_ to stand up in front of the 100+ audience. Dmitry Kan decided to present on luke and announce its elasticsearch support, that got enabled during the conference. Here is the video, we hope you enjoy it. Fast-forward to 6:32 to listen Dmitry’s presentation on luke:

The ApacheCon was an extraordinary event from the technical stand point, and a very warm, friendly and relaxed one on human and social networking side. We highly recommend you to participate in the next ApacheCon Europe in Budapest September-October 2015!