Altoros’s Insight on Big Data Week in Moscow: Big Data—Big Work!

by Alena VasilenkoMay 8, 2013

At the conference, participants discussed real-time data analytics, development of prediction services, data science algorithms, methods of Hadoop optimization, etc.

Table of Contents

First impressions

The three-day conference included a number of sessions on real-time data analytics, development of prediction services, data science algorithms, cutting-edge tools for dealing with big data, methods of Hadoop optimization, etc. In this post, Kirill Grigorchuk, an R&D Engineer at Altoros, will share his impressions about the event.

I always monitor schedules of the most interesting IT events. Exactly at the moment when I saw that a Big Data Week will be held in Moscow, I decided that I must be there. By the way, the topics of the sessions sounded really appealing!

The conference was held in the Danilovskaya Manufacture business center located on the bank of the Moskva River, in a nice area not far from a busy center of Moscow. I had a subtle feeling that this place reminds me of London. This part of the city was free of usual Moscow haste, and the hundreds-year-old red brick buildings that used to be a weaving factory long ago created a special leisurely vibe.

The event was held in the office of the Rambler-Afisha company. In the dimmed light of the “crystal balls,” the participants eagerly discussed the upcoming presentations.

Being a part of Hadoop history

The “Big Week” lasted for three days. As always, the first day started with the introduction speech from the organizers. They greeted everybody who came to the conference despite of the terrible weather outside.

Then, Alexey Filanovsky, Oracle, carried out a “Hadoop Ecosystem” workshop. First, he covered some basic notions such as the main principles of HDFS, peculiarities of MapReduce, methods of data input/output to/from a Hadoop cluster. After that, he dwelled on more sophisticated issues. In total, Alexey demonstrated four presentations that highlighted different aspects of working with Hadoop.

I was very much looking forward to the second day, because the three teleconf sessions from the Silicon Valley companies were scheduled for Friday.

Alex Varshavsky, Talksum, overviewed a tool that allows for almost real-time processing of input data. He demonstrated how the system can filter, alert, aggregate, correlate, and enrich data streams to solve the hot big data problem of three V’s.

In his presentation, “Evolution of Architecture for Exponential Growth,” Konstantin Shvachko, a Chief Architect at WANdisco (previously, a Hadoop developer at eBay and Yahoo), told about the main challenges the team faced when data started to increase and explained how those issues were overcome. Although it was just a teleconference and we were divided by many miles, thanks to this session I felt like a part of Hadoop history.

The day ended with a presentation “Impala: A Modern SQL Engine for Hadoop” by Justin Erickson, Cloudera. He spoke on Impala architecture and implementation, compared this solution against Apache Hive and some other data warehouses. However, he did not answer the hottest question: when Impala will be released.

The magic behind big data

The final day of the “Week” was very informative. Dmitry Fedoruk from Google, Ireland, revealed one of the secrets of how Google AdWords works. He told about Photon, a fault-tolerant and scalable solution for joining of continuous data streams. It sounded like real magic.

To be honest, I was not too much impressed by the presentation from IBM. Although some ideas were not bad, the speaker was a bit too much focused on problems of large enterprises, in which employees at the lowest level don’t know what the top managers do.

Then, Evgeniy Polyakov, Yandex, spoke about some cases of data storage. Although I could only partially agree with some points, I would give this session 10 scores for brilliant examples and artistic presentation.

Former colleagues at Rambler, Anton Gorokhov and Pavel Mezentsev, delivered a compelling session on how the company developed a personalization service to make advertising and news more targeted. They told about the used data analysis methods and demonstrated how they were implemented. It took 35 man-months to teach the system detect who (a man/female, their age, preferences, etc.) visits a website right at the moment. So, big data—big work!

After some presentations “Using Hadoop with Mail.ru” by Aleksey Romanenko and Maksim Lapan, Mail.ru Group, I reassessed this web resource. Like some others in the IT community, I used to think that the company’s services are merely entertaining and are not really interesting. In reality, it turned out that a lot of experienced professionals are working at the company and they are the early adopters of many cutting-edge technologies. So, Maksim demonstrated a very interesting session on how to optimize HBase.

I would like to thank all the event managers who invited such interesting speakers. Such events are a good opportunity to pick up some new ideas and to learn from the experienced colleagues.