Category

Blog on All Things Cloud Foundry

Hadoop Distributions: Comparison and Top 5 Trends

Kirill Grigorchuk

Ever wondered how Hadoop distros differ from each other? In a recent article for NetworkWorld, I overview how Hadoop became what it is today and explore the differences between the standard edition vs. Hortonworks, Cloudera, and MapR. I also provided insights into 5 major trends that are shaping their evolution—in terms of features, ecosystem, enterprise adoption, etc.

Read the article to learn about:

– The top 5 trends currently affecting the evolution of Hadoop distributions
– Why enterprises need Hadoop distros and how they differ
– How YARN has solved the issues present in Hadoop 1.0
– What will become of Hadoop in the foreseeable future

Continue to the article at NetworkWorld: “Comparing the Top Hadoop Distributions.”

No Comments

How to Speed Up AngularJS Apps That Use Internalization Libraries

Ilya Drabenia

angularjs-logo

For a while, I was involved into development of a document workflow management-as-a-service system. The client app was created using AngularJS and I was to decide on the best option for its internalization (i18n) and localization (l10n). However, the service had a rather complex UI with plenty of messages to be translated and transferred between the server and the client. Therefore, I was concerned about the impact of client-side translation on performance.

In this post, I explore this in detail and suggest an approach that accelerates performance of AngularJS internationalization.

(more…)

3 Comments

Altoros Recognized as a One of Top Hadoop and Big Data Consultants

Volha Kurylionak

Congrats to our Hadoop team! Altoros has been recognized as a proven market leader in Hadoop and big data consulting, according to a recent study by SourcingLine (a Washington, DC-based research company). Their Leaders Matrix leverages proprietary research and methodology to identify top services firms and map their capabilities. Companies are plotted on the matrix based on their proven ability to deliver and focus on a service type.

The analysis based on verified customer interviews gave Altoros a high ranking on two lists: Top Big Data & BI Consulting Companies and Top Hadoop Consulting Companies. (Bubble size indicates relative size of the firms.)

Hadoop leaders matrix copy

“All of the selected companies focus primarily on Big Data analytics or Hadoop and have proven they can provide tangible value to data driven enterprises,” says Tim Clarke, Senior Business Analyst at SourcingLine. “However, it is often challenging for prospective buyers to devise a short-list of leading firms. Our research aims to expedite the procurement process and help connect buyers with qualified service providers.”

Take a look at Altoros’s customer ratings or view the results of these two reports:

big_data_bi_consultants_bhadoop_consultants_b
Or, read our latest Hadoop studies:

No Comments

Performance Comparison of Ruby Frameworks: Sinatra, Padrino, Goliath, and Ruby on Rails

Eugene Melnikov

ruby-frameworks-1

The main goal of this article was to find the best framework for a very basic but highly loaded Ruby application. This is the updated version of the comparison that was first posted in Jun 2013. Now we ran all the tests again, using the latest versions of Sinatra, Padrino, Goliath, and RoR. Unfortunately, the Espresso framework that we had tested last time disappeared from all the repositories, so it is no longer included.

(more…)

18 Comments

Performance of RAID Arrays on Windows Azure: an Alternative to Horizontal Scaling

Sergey Balashevich

While working with several different NoSQL databases heavily loaded with write requests, we faced a situation when the hard drive became a bottleneck. Scaling the cluster horizontally could easily solve this kind of problem, but it would also increase the monthly payments. This is why we decided to take a look at other options.

The first thing that comes to mind when a DB starts experiencing HDD performance issues is to combine several virtual drives into a RAID array, but how will it work with Windows Azure virtual infrastructure? To check this, we compared the performance of a single virtual drive and different RAID arrays (types: 0, 1, 4, 5, and 6) using the Bonnie++ tool for hard drive subsystem verification.

Below you will find the test results and step-by-step instructions on how to configure a RAID array on your own.

 

Test 1: RAID performance under Write/Read/Re-write workloads

In the first test, we measured the performance of different RAID arrays for simple read/write operations:

sudo bonnie++ -d /raid1/ -m 'raid1' -u root -n 100:8192:16384:20 -x10 -s 16g -f > raid1.csv

Bonnie++ was run 10 times (-x10). Each test worked with 100 files of 8-16 KB in size and 20 subdirectories. In total, there were 16 GB of “files” in each iteration. Since a large Windows Azure instance has 7 GB of RAM, we had a chance to avoid caching.

You can see the first test results below. The x-axis stands for megabytes per second, the y-axis indicates repetitions (we ran each test 10 times).

Write test results:

Write_test_x2

(more…)

7 Comments

LikeFolio: Invest in What You "Like" (Fox Business Video)

Alex Khizhniak

A scalable architecture based on Redis

Recently, Fox Business interviewed Nicole Sherrod of TD Ameritrade on online stock trading. In the video below, she is talking about the success of LikeFolio, a web project that assists online investors by analyzing social media data.

Altoros was proud to help SwanPowers, a partner of TD Ameritrade, to build this application, which is based on the “invest in what you know” concept. The system aggregates your conversations, status updates, likes, and check-ins from social networks and translates this data into investment ideas (using IPO information).

likefolio-redis-amazon-ruby-altoros

LikeFolio was written with Ruby and features a distributed, scalable architecture able to serve 10,000+ users simultaneously. To provide scalability and service availability, LikeFolio was deployed on Amazon’s infrastructure and utilized Redis—a scalable NoSQL store—for caching and network scalability (the pub/sub model).

Read the customer story in our portfolio to learn more about other technologies used.

 

Want details? Watch the video!

No Comments

PaaS News Summary: December 2013

Volha Kurylionak

14-01-13_PaaS_News_Logos_ok

Read the most significant news from Platform-as-a-Service vendors for Dec 2013.

Highlights:

  1. SAP Open Sources a Cloud Foundry Service Broker for Their HANA DB
  2. Hangops Panel: Cloud Foundry, Stackato, Appcera, Github, Mozilla, and OpenShift
  3. Dell’s Customers Will Get Access to CenturyLink’s Public Cloud Services
  4. ActiveState Releases Stackato v3.0.1
  5. Red Hat Releases OpenShift Enterprise 2 and OpenShift Origin 3, Updates OpenShift Online
  6. Baidu to Use Docker Containers for Their Own PaaS
  7. Microsoft Aims to Boost Azure Use with Cloud OS Network of Partners
  8. CloudBees Now Supports Java EE 7
  9. Engine Yard’s Early Access: Java EE 7, Postgre 9.3, Percona 5.6
  10. An Improved Node.js Buildpack from Heroku
  11. New PaaS Platforms: PLDT Cloud PaaS and SlapRunner

(more…)

No Comments

MADlib, a Solution for Big Data Analytics from Pivotal

Sofia Parfenovich

General overview

There are a number of data analytics solutions that support the MapReduce principle and able to work with NoSQL databases. However, most enterprises still rely on mature SQL data stores and, therefore, need traditional analytics solutions to provide in-depth analysis of their business-critical data.

MADlib is a scalable in-database analytics library that features sophisticated mathematical algorithms for SQL-based systems. MADlib was developed jointly by researchers from UC Berkeley and engineers from Pivotal (formerly EMC/Greenplum). It can be considered as an enterprise alternative to Hadoop in machine learning, data mining, and statistics tasks. In addition, MADlib supports time series rows, which could not be processed appropriately by Hadoop, greatly extending capabilities for building prediction systems. (For more information, watch a video overview from Pivotal, read this introduction to MADlib, or visit the product page.)

Since I already had some experience in Wolfram Mathematica, I was tempted to compare the two products. The presentation that claimed MADlib’s high performance and great scalability of the built-in machine learning algorithms even boosted my curiosity. Below is one of the slides taken from this document. The solution is supposed to process billions of rows in minutes, impressive math!

(more…)

4 Comments

Hadoop Benchmark: Cloudera vs. Hortonworks vs. MapR

Alex Khizhniak

Evaluating Hadoop distributions across 7 workloads

Cloudera, Hortonworks, and MapR are the most popular Hadoop distributions available today. However, even with this short list, there are few unbiased comparisons of their cluster performance. So, today we’re introducing a 65-page research paper that contains a vendor-independent overview of Cloudera, Hortonworks, and MapR distributions.

cloudera_hortonworks_mapr

Vladimir Starostenkov of Altoros compared throughput of 8-, 12-, and 16-node clusters against performance of a 4-node cluster. (The speed of data processing of 8-, 12-, and 16-node clusters was divided by the throughput of a 4-node cluster.) The results were quite unexpected.

 

Hadoop cluster performance: bigger doesn’t mean faster

In a recent interview to TechTarget, our R&D Engineer Dmitriy Kalyada explained why adding nodes to a Hadoop cluster not always results in better performance. The new benchmark of Hadoop distributions confirms this behavior under several workloads.

For instance, when sorting unstructured text data (the Sort workload), the performance of a MapR cluster was growing linearly (as we were increasing its size from 4 to 8 nodes). After that, when new machines were added, the throughput of each separate node was degrading.

mapr_performance

As you can see on the diagram, an 8-node cluster turned out to be faster than clusters of 12 and 16 nodes. The same situation was observed in the DFSIO write test. Other Hadoop distributions had similar results under some of the workloads, too.

Download the benchmark to see all the performance results (83 diagrams, 7 types of workloads), including:

  • detailed performance results for 4-, 8-, 12-, and 16-node clusters
  • how the size of a cluster affects data processing speed
  • how different clusters behave under CPU and disk-bound workloads (including Bayes, DFSIO, Hive aggregation, PageRank, Sort, TeraSort, and WordCount)
  • what issues slow down deployment and how to maximize Hadoop processing speed

Get your copy of “Hadoop Distributions: Cloudera vs. Hortonworks vs. MapR” and let us know what you think about these results.

4 Comments

Big Data in Denmark: Notes from IT Messe 2013

Alex Khizhniak

Although the big data market in Denmark is still young now, it is definitely growing.

On Oct 9–10, the IT Messe conference attracted 40+ exhibitors and ~500 attendees to the Horsens city. At the event, Kim Jonassen, our Managing Director Denmark, spoke on the state of big data in Denmark. He overviewed the type of systems and companies that struggle with big data and also explained how distributed processing, NoSQL data stores, and other tools address these issues.

DSC02251.jpg

(more…)

No Comments

« Previous Page   |   Next Page »

Benchmarks and Research

Subscribe to new posts

Get new posts right in your inbox!