Tag Archive

Blog on All Things Cloud Foundry

How to Deploy Hadoop Using Docker Containers

Renat Khasanshyn

Docker Hadoop Meetup Jan 06

At one of our recent meetups, Nasser Manesh of Altiscale shared his experience with deploying multi-tenant Hadoop clusters using Docker. The talk covered the differences between containers and VMs, as well as addressed typical issues with containers, configuration, monitoring, troubleshooting, etc.

Dan Lorenc of Google explained when, why, and how to adopt Docker within your organization.

(more…)

No Comments

Hadoop Distributions: Comparison and Top 5 Trends

Kirill Grigorchuk

Ever wondered how Hadoop distros differ from each other? In a recent article for NetworkWorld, I overview how Hadoop became what it is today and explore the differences between the standard edition vs. Hortonworks, Cloudera, and MapR. I also provided insights into 5 major trends that are shaping their evolution—in terms of features, ecosystem, enterprise adoption, etc.

Read the article to learn about:

– The top 5 trends currently affecting the evolution of Hadoop distributions
– Why enterprises need Hadoop distros and how they differ
– How YARN has solved the issues present in Hadoop 1.0
– What will become of Hadoop in the foreseeable future

Continue to the article at NetworkWorld: “Comparing the Top Hadoop Distributions.”

No Comments

Altoros Recognized as a One of Top Hadoop and Big Data Consultants

Volha Kurylionak

Congrats to our Hadoop team! Altoros has been recognized as a proven market leader in Hadoop and big data consulting, according to a recent study by SourcingLine (a Washington, DC-based research company). Their Leaders Matrix leverages proprietary research and methodology to identify top services firms and map their capabilities. Companies are plotted on the matrix based on their proven ability to deliver and focus on a service type.

The analysis based on verified customer interviews gave Altoros a high ranking on two lists: Top Big Data & BI Consulting Companies and Top Hadoop Consulting Companies. (Bubble size indicates relative size of the firms.)

Hadoop leaders matrix copy

“All of the selected companies focus primarily on Big Data analytics or Hadoop and have proven they can provide tangible value to data driven enterprises,” says Tim Clarke, Senior Business Analyst at SourcingLine. “However, it is often challenging for prospective buyers to devise a short-list of leading firms. Our research aims to expedite the procurement process and help connect buyers with qualified service providers.”

Take a look at Altoros’s customer ratings or view the results of these two reports:

big_data_bi_consultants_bhadoop_consultants_b
Or, read our latest Hadoop studies:

No Comments

Hadoop Benchmark: Cloudera vs. Hortonworks vs. MapR

Alex Khizhniak

Evaluating Hadoop distributions across 7 workloads

Cloudera, Hortonworks, and MapR are the most popular Hadoop distributions available today. However, even with this short list, there are few unbiased comparisons of their cluster performance. So, today we’re introducing a 65-page research paper that contains a vendor-independent overview of Cloudera, Hortonworks, and MapR distributions.

cloudera_hortonworks_mapr

Vladimir Starostenkov of Altoros compared throughput of 8-, 12-, and 16-node clusters against performance of a 4-node cluster. (The speed of data processing of 8-, 12-, and 16-node clusters was divided by the throughput of a 4-node cluster.) The results were quite unexpected.

 

Hadoop cluster performance: bigger doesn’t mean faster

In a recent interview to TechTarget, our R&D Engineer Dmitriy Kalyada explained why adding nodes to a Hadoop cluster not always results in better performance. The new benchmark of Hadoop distributions confirms this behavior under several workloads.

For instance, when sorting unstructured text data (the Sort workload), the performance of a MapR cluster was growing linearly (as we were increasing its size from 4 to 8 nodes). After that, when new machines were added, the throughput of each separate node was degrading.

mapr_performance

As you can see on the diagram, an 8-node cluster turned out to be faster than clusters of 12 and 16 nodes. The same situation was observed in the DFSIO write test. Other Hadoop distributions had similar results under some of the workloads, too.

Download the benchmark to see all the performance results (83 diagrams, 7 types of workloads), including:

  • detailed performance results for 4-, 8-, 12-, and 16-node clusters
  • how the size of a cluster affects data processing speed
  • how different clusters behave under CPU and disk-bound workloads (including Bayes, DFSIO, Hive aggregation, PageRank, Sort, TeraSort, and WordCount)
  • what issues slow down deployment and how to maximize Hadoop processing speed

Get your copy of “Hadoop Distributions: Cloudera vs. Hortonworks vs. MapR” and let us know what you think about these results.

4 Comments

Hadoop + GPU: Boost Performance of Your Big Data Project by 50x-200x?

Vladimir Starostenkov

Hadoop, an open-source framework that enables distributed computing, has changed the way we deal with big data. Parallel processing with this set of tools can improve performance several times over. The question is, can we make it work even faster? What about offloading calculations from a CPU to a graphics processing unit (GPU) designed to perform complex 3D and mathematical tasks? In theory, if the process is optimized for parallel computing, a GPU could perform calculations 50-100 times faster than a CPU.

Read my article at NetworkWorld to find out what is possible and how you can try this for your large-scale system.

No Comments

Benchmarks and Research

Subscribe to new posts

Get new posts right in your inbox!