Hadoop on Windows Azure: Hive vs. JavaScript for Processing Big Data

Alena Vasilenko

For some time Microsoft didn’t offer a solution for processing big data in cloud environments. SQL Server is good for storage, but its ability to analyze terabytes of data is limited. Hadoop, which was designed for this purpose, is written in Java and was not available to .NET developers. So, Microsoft launched the Hadoop on Windows Azure service to make it possible to distribute the load and speed up big data computations.

The Altoros’s R&D engineers evaluated two out-of-the-box ways of processing big data with Hadoop on Windows Azure—Hive querying and JavaScript implementations—and compared their performance.

For the research, we created eight types of queries in both languages and measured how fast they were processed. Since we wanted to test how the system would handle big data, we downloaded information on US Air Carrier Flight Delays from Windows Azure Marketplace and generated a data set of 9.15 GB.

The article reveals how additional grouping parameters of the query and type of an arithmetic operation affect the throughput. It also shows the dependency between the number of MapReduce tasks and the speed of calculations. In addition, the paper contains conclusions on how the HDFS block size (8 MB, 64 MB, and 256 MB) influences performance. You’ll find two tables and three graphs with the findings.

Find out the results of the evaluation in NetworkWorld.

Read the full version of the research in the White Paper.

No Comments

Benchmarks and Research

Subscribe to new posts

Get new posts right in your inbox!