Our Blog Has Migrated to a New Location

Blog on All Things Cloud Foundry

Our Blog Has Migrated to a New Location

Alex Khizhniak

Note! New posts on cloud-native transformation are now published at

          www.altoros.com/blog

Subscribe there for everything Cloud Foundry, Kubernetes, blockchain, and AI/ML.

No Comments

Performance Evaluation: MongoDB over NetApp E-Series

Vladimir Starostenkov

mongodb_over_netapp_e_series-v1

NetApp, a provider of high-performing data storage systems, has been working on adjusting its offerings to the requirements of NoSQL databases (such as MongoDB). As a result, the company now offers two MongoDB-certified flash storage solutions. Altoros joined the effort to evaluate these products.

This blog post reveals some of the performance results for the MongoDB integrated architecture deployed to NetApp E-Series.

(more…)

No Comments

Performance Benchmark: Redis Cloud vs. ElastiCache vs. openredis vs. RedisGreen vs. Redis To Go

Vladimir Starostenkov

redis database logoIn most performance comparisons, Redis (an open source key-value cache/store) is mainly treated as a caching-only solution. Others are only focused on a single provider. However, customers are interested in deeper utilization of built-in data types and server-side operations. In production, you may have several loads that query your database simultaneously—with different types of tasks.

For this reason, we designed a scenario that evaluates Redis performance in more complicated conditions. It combines two different types of queries (both simple and complex) generated concurrently. We’ve just published the performance results here (latencies, throughput, etc.). In this blog post, you’ll find some of the main findings.

(more…)

No Comments

NoSQL Tech Comparison 2014: Cassandra (DataStax), MongoDB, and Couchbase

Alex Khizhniak

Introducing a NoSQL scoring framework

Even if you have years of experience with data-intensive apps, selecting a NoSQL data store for a particular case out of dozens of options may be a daunting task. The variety of databases goes way beyond sheer numbers, so you have to carefully compare and benchmark several options before you can choose the most appropriate solution.

To help companies select the best database based on particular use cases, workloads, or requirements, we decided to come up with a handy template for evaluating NoSQL solutions. While many other comparisons focus only on one or two dimensions, we compiled a scoring framework that approaches the databases from 20+ angles (including performance, scalability, availability, ease of installation, maintenance, data consistency, fault tolerance, replication, recovery, etc.).

As a real-life example of such an evaluation benchmark, today we present “The NoSQL Technical Comparison Report,” which provides an in-depth analysis of the leading NoSQL systems: Cassandra (DataStax), MongoDB, and Couchbase Server. Each of the databases was scored on a scale from 1 to 10 across 21 criteria.

With 29 charts and 30 tables, this paper features a scoring template for evaluating and comparing NoSQL data stores for your particular use case—depending on the weight of each criterion. We also give recommendations on the best ways to configure, install, and use NoSQL databases depending on their specific features.

 

Want details? Watch a webinar!

(more…)

No Comments

Hadoop Distributions: Comparison and Top 5 Trends

Kirill Grigorchuk

Ever wondered how Hadoop distros differ from each other? In a recent article for NetworkWorld, I overview how Hadoop became what it is today and explore the differences between the standard edition vs. Hortonworks, Cloudera, and MapR. I also provided insights into 5 major trends that are shaping their evolution—in terms of features, ecosystem, enterprise adoption, etc.

Read the article to learn about:

– The top 5 trends currently affecting the evolution of Hadoop distributions
– Why enterprises need Hadoop distros and how they differ
– How YARN has solved the issues present in Hadoop 1.0
– What will become of Hadoop in the foreseeable future

Continue to the article at NetworkWorld: “Comparing the Top Hadoop Distributions.”

No Comments

Performance Comparison of Ruby Frameworks: Sinatra, Padrino, Goliath, and Ruby on Rails

Eugene Melnikov

ruby-frameworks-1

The main goal of this article was to find the best framework for a very basic but highly loaded Ruby application. This is the updated version of the comparison that was first posted in Jun 2013. Now we ran all the tests again, using the latest versions of Sinatra, Padrino, Goliath, and RoR. Unfortunately, the Espresso framework that we had tested last time disappeared from all the repositories, so it is no longer included.

(more…)

18 Comments

Performance of RAID Arrays on Windows Azure: an Alternative to Horizontal Scaling

Sergey Balashevich

While working with several different NoSQL databases heavily loaded with write requests, we faced a situation when the hard drive became a bottleneck. Scaling the cluster horizontally could easily solve this kind of problem, but it would also increase the monthly payments. This is why we decided to take a look at other options.

The first thing that comes to mind when a DB starts experiencing HDD performance issues is to combine several virtual drives into a RAID array, but how will it work with Windows Azure virtual infrastructure? To check this, we compared the performance of a single virtual drive and different RAID arrays (types: 0, 1, 4, 5, and 6) using the Bonnie++ tool for hard drive subsystem verification.

Below you will find the test results and step-by-step instructions on how to configure a RAID array on your own.

 

Test 1: RAID performance under Write/Read/Re-write workloads

In the first test, we measured the performance of different RAID arrays for simple read/write operations:

sudo bonnie++ -d /raid1/ -m 'raid1' -u root -n 100:8192:16384:20 -x10 -s 16g -f > raid1.csv

Bonnie++ was run 10 times (-x10). Each test worked with 100 files of 8-16 KB in size and 20 subdirectories. In total, there were 16 GB of “files” in each iteration. Since a large Windows Azure instance has 7 GB of RAM, we had a chance to avoid caching.

You can see the first test results below. The x-axis stands for megabytes per second, the y-axis indicates repetitions (we ran each test 10 times).

Write test results:

Write_test_x2

(more…)

7 Comments

LikeFolio: Invest in What You "Like" (Fox Business Video)

Alex Khizhniak

A scalable architecture based on Redis

Recently, Fox Business interviewed Nicole Sherrod of TD Ameritrade on online stock trading. In the video below, she is talking about the success of LikeFolio, a web project that assists online investors by analyzing social media data.

Altoros was proud to help SwanPowers, a partner of TD Ameritrade, to build this application, which is based on the “invest in what you know” concept. The system aggregates your conversations, status updates, likes, and check-ins from social networks and translates this data into investment ideas (using IPO information).

likefolio-redis-amazon-ruby-altoros

LikeFolio was written with Ruby and features a distributed, scalable architecture able to serve 10,000+ users simultaneously. To provide scalability and service availability, LikeFolio was deployed on Amazon’s infrastructure and utilized Redis—a scalable NoSQL store—for caching and network scalability (the pub/sub model).

Read the customer story in our portfolio to learn more about other technologies used.

 

Want details? Watch the video!

No Comments

MADlib, a Solution for Big Data Analytics from Pivotal

Sofia Parfenovich

General overview

There are a number of data analytics solutions that support the MapReduce principle and able to work with NoSQL databases. However, most enterprises still rely on mature SQL data stores and, therefore, need traditional analytics solutions to provide in-depth analysis of their business-critical data.

MADlib is a scalable in-database analytics library that features sophisticated mathematical algorithms for SQL-based systems. MADlib was developed jointly by researchers from UC Berkeley and engineers from Pivotal (formerly EMC/Greenplum). It can be considered as an enterprise alternative to Hadoop in machine learning, data mining, and statistics tasks. In addition, MADlib supports time series rows, which could not be processed appropriately by Hadoop, greatly extending capabilities for building prediction systems. (For more information, watch a video overview from Pivotal, read this introduction to MADlib, or visit the product page.)

Since I already had some experience in Wolfram Mathematica, I was tempted to compare the two products. The presentation that claimed MADlib’s high performance and great scalability of the built-in machine learning algorithms even boosted my curiosity. Below is one of the slides taken from this document. The solution is supposed to process billions of rows in minutes, impressive math!

(more…)

4 Comments

Hadoop Benchmark: Cloudera vs. Hortonworks vs. MapR

Alex Khizhniak

Evaluating Hadoop distributions across 7 workloads

Cloudera, Hortonworks, and MapR are the most popular Hadoop distributions available today. However, even with this short list, there are few unbiased comparisons of their cluster performance. So, today we’re introducing a 65-page research paper that contains a vendor-independent overview of Cloudera, Hortonworks, and MapR distributions.

cloudera_hortonworks_mapr

Vladimir Starostenkov of Altoros compared throughput of 8-, 12-, and 16-node clusters against performance of a 4-node cluster. (The speed of data processing of 8-, 12-, and 16-node clusters was divided by the throughput of a 4-node cluster.) The results were quite unexpected.

 

Hadoop cluster performance: bigger doesn’t mean faster

In a recent interview to TechTarget, our R&D Engineer Dmitriy Kalyada explained why adding nodes to a Hadoop cluster not always results in better performance. The new benchmark of Hadoop distributions confirms this behavior under several workloads.

For instance, when sorting unstructured text data (the Sort workload), the performance of a MapR cluster was growing linearly (as we were increasing its size from 4 to 8 nodes). After that, when new machines were added, the throughput of each separate node was degrading.

mapr_performance

As you can see on the diagram, an 8-node cluster turned out to be faster than clusters of 12 and 16 nodes. The same situation was observed in the DFSIO write test. Other Hadoop distributions had similar results under some of the workloads, too.

Download the benchmark to see all the performance results (83 diagrams, 7 types of workloads), including:

  • detailed performance results for 4-, 8-, 12-, and 16-node clusters
  • how the size of a cluster affects data processing speed
  • how different clusters behave under CPU and disk-bound workloads (including Bayes, DFSIO, Hive aggregation, PageRank, Sort, TeraSort, and WordCount)
  • what issues slow down deployment and how to maximize Hadoop processing speed

Get your copy of “Hadoop Distributions: Cloudera vs. Hortonworks vs. MapR” and let us know what you think about these results.

4 Comments

Benchmarks and Research

Subscribe to new posts

Get new posts right in your inbox!