Category

Blog on All Things Cloud Foundry

How to Speed Up AngularJS Apps That Use Internalization Libraries

Ilya Drabenia

angularjs-logo

For a while, I was involved into development of a document workflow management-as-a-service system. The client app was created using AngularJS and I was to decide on the best option for its internalization (i18n) and localization (l10n). However, the service had a rather complex UI with plenty of messages to be translated and transferred between the server and the client. Therefore, I was concerned about the impact of client-side translation on performance.

In this post, I explore this in detail and suggest an approach that accelerates performance of AngularJS internationalization.

(more…)

3 Comments

Altoros Recognized as a One of Top Hadoop and Big Data Consultants

Volha Kurylionak

Congrats to our Hadoop team! Altoros has been recognized as a proven market leader in Hadoop and big data consulting, according to a recent study by SourcingLine (a Washington, DC-based research company). Their Leaders Matrix leverages proprietary research and methodology to identify top services firms and map their capabilities. Companies are plotted on the matrix based on their proven ability to deliver and focus on a service type.

The analysis based on verified customer interviews gave Altoros a high ranking on two lists: Top Big Data & BI Consulting Companies and Top Hadoop Consulting Companies. (Bubble size indicates relative size of the firms.)

Hadoop leaders matrix copy

“All of the selected companies focus primarily on Big Data analytics or Hadoop and have proven they can provide tangible value to data driven enterprises,” says Tim Clarke, Senior Business Analyst at SourcingLine. “However, it is often challenging for prospective buyers to devise a short-list of leading firms. Our research aims to expedite the procurement process and help connect buyers with qualified service providers.”

Take a look at Altoros’s customer ratings or view the results of these two reports:

big_data_bi_consultants_bhadoop_consultants_b
Or, read our latest Hadoop studies:

No Comments

Performance Comparison of Ruby Frameworks: Sinatra, Padrino, Goliath, and Ruby on Rails

Eugene Melnikov

ruby-frameworks-1

The main goal of this article was to find the best framework for a very basic but highly loaded Ruby application. This is the updated version of the comparison that was first posted in Jun 2013. Now we ran all the tests again, using the latest versions of Sinatra, Padrino, Goliath, and RoR. Unfortunately, the Espresso framework that we had tested last time disappeared from all the repositories, so it is no longer included.

(more…)

18 Comments

Performance of RAID Arrays on Windows Azure: an Alternative to Horizontal Scaling

Sergey Balashevich

While working with several different NoSQL databases heavily loaded with write requests, we faced a situation when the hard drive became a bottleneck. Scaling the cluster horizontally could easily solve this kind of problem, but it would also increase the monthly payments. This is why we decided to take a look at other options.

The first thing that comes to mind when a DB starts experiencing HDD performance issues is to combine several virtual drives into a RAID array, but how will it work with Windows Azure virtual infrastructure? To check this, we compared the performance of a single virtual drive and different RAID arrays (types: 0, 1, 4, 5, and 6) using the Bonnie++ tool for hard drive subsystem verification.

Below you will find the test results and step-by-step instructions on how to configure a RAID array on your own.

 

Test 1: RAID performance under Write/Read/Re-write workloads

In the first test, we measured the performance of different RAID arrays for simple read/write operations:

sudo bonnie++ -d /raid1/ -m 'raid1' -u root -n 100:8192:16384:20 -x10 -s 16g -f > raid1.csv

Bonnie++ was run 10 times (-x10). Each test worked with 100 files of 8-16 KB in size and 20 subdirectories. In total, there were 16 GB of “files” in each iteration. Since a large Windows Azure instance has 7 GB of RAM, we had a chance to avoid caching.

You can see the first test results below. The x-axis stands for megabytes per second, the y-axis indicates repetitions (we ran each test 10 times).

Write test results:

Write_test_x2

(more…)

7 Comments

LikeFolio: Invest in What You "Like" (Fox Business Video)

Alex Khizhniak

A scalable architecture based on Redis

Recently, Fox Business interviewed Nicole Sherrod of TD Ameritrade on online stock trading. In the video below, she is talking about the success of LikeFolio, a web project that assists online investors by analyzing social media data.

Altoros was proud to help SwanPowers, a partner of TD Ameritrade, to build this application, which is based on the “invest in what you know” concept. The system aggregates your conversations, status updates, likes, and check-ins from social networks and translates this data into investment ideas (using IPO information).

likefolio-redis-amazon-ruby-altoros

LikeFolio was written with Ruby and features a distributed, scalable architecture able to serve 10,000+ users simultaneously. To provide scalability and service availability, LikeFolio was deployed on Amazon’s infrastructure and utilized Redis—a scalable NoSQL store—for caching and network scalability (the pub/sub model).

Read the customer story in our portfolio to learn more about other technologies used.

 

Want details? Watch the video!

No Comments

MADlib, a Solution for Big Data Analytics from Pivotal

Sofia Parfenovich

General overview

There are a number of data analytics solutions that support the MapReduce principle and able to work with NoSQL databases. However, most enterprises still rely on mature SQL data stores and, therefore, need traditional analytics solutions to provide in-depth analysis of their business-critical data.

MADlib is a scalable in-database analytics library that features sophisticated mathematical algorithms for SQL-based systems. MADlib was developed jointly by researchers from UC Berkeley and engineers from Pivotal (formerly EMC/Greenplum). It can be considered as an enterprise alternative to Hadoop in machine learning, data mining, and statistics tasks. In addition, MADlib supports time series rows, which could not be processed appropriately by Hadoop, greatly extending capabilities for building prediction systems. (For more information, watch a video overview from Pivotal, read this introduction to MADlib, or visit the product page.)

Since I already had some experience in Wolfram Mathematica, I was tempted to compare the two products. The presentation that claimed MADlib’s high performance and great scalability of the built-in machine learning algorithms even boosted my curiosity. Below is one of the slides taken from this document. The solution is supposed to process billions of rows in minutes, impressive math!

(more…)

4 Comments

Hadoop Benchmark: Cloudera vs. Hortonworks vs. MapR

Alex Khizhniak

Evaluating Hadoop distributions across 7 workloads

Cloudera, Hortonworks, and MapR are the most popular Hadoop distributions available today. However, even with this short list, there are few unbiased comparisons of their cluster performance. So, today we’re introducing a 65-page research paper that contains a vendor-independent overview of Cloudera, Hortonworks, and MapR distributions.

cloudera_hortonworks_mapr

Vladimir Starostenkov of Altoros compared throughput of 8-, 12-, and 16-node clusters against performance of a 4-node cluster. (The speed of data processing of 8-, 12-, and 16-node clusters was divided by the throughput of a 4-node cluster.) The results were quite unexpected.

 

Hadoop cluster performance: bigger doesn’t mean faster

In a recent interview to TechTarget, our R&D Engineer Dmitriy Kalyada explained why adding nodes to a Hadoop cluster not always results in better performance. The new benchmark of Hadoop distributions confirms this behavior under several workloads.

For instance, when sorting unstructured text data (the Sort workload), the performance of a MapR cluster was growing linearly (as we were increasing its size from 4 to 8 nodes). After that, when new machines were added, the throughput of each separate node was degrading.

mapr_performance

As you can see on the diagram, an 8-node cluster turned out to be faster than clusters of 12 and 16 nodes. The same situation was observed in the DFSIO write test. Other Hadoop distributions had similar results under some of the workloads, too.

Download the benchmark to see all the performance results (83 diagrams, 7 types of workloads), including:

  • detailed performance results for 4-, 8-, 12-, and 16-node clusters
  • how the size of a cluster affects data processing speed
  • how different clusters behave under CPU and disk-bound workloads (including Bayes, DFSIO, Hive aggregation, PageRank, Sort, TeraSort, and WordCount)
  • what issues slow down deployment and how to maximize Hadoop processing speed

Get your copy of “Hadoop Distributions: Cloudera vs. Hortonworks vs. MapR” and let us know what you think about these results.

4 Comments

Big Data in Denmark: Notes from IT Messe 2013

Alex Khizhniak

Although the big data market in Denmark is still young now, it is definitely growing.

On Oct 9–10, the IT Messe conference attracted 40+ exhibitors and ~500 attendees to the Horsens city. At the event, Kim Jonassen, our Managing Director Denmark, spoke on the state of big data in Denmark. He overviewed the type of systems and companies that struggle with big data and also explained how distributed processing, NoSQL data stores, and other tools address these issues.

DSC02251.jpg

(more…)

No Comments

Hosting a Big Data Meetup: Hadoop on Windows Azure from Microsoft First-hand

Volha Kurylionak

On September 25-26, Altoros hosted Big Data Dive ’13, an annual meetup for developers, R&D specialists, and system architects who work with massive amounts of data. Our R&D team and a special guest from Microsoft shared their hands-on experience in Hadoop, NoSQL, Windows Azure, and other big data/cloud technologies.

r5h1bniVxVU

The agenda included sessions on practical aspects of big data storage and processing:

  • “NoSQL Benchmarking v2.0. Evaluating Performance of Modern NoSQL Solutions.” Following our NoSQL database benchmark (Oct 2012), Dmitriy Kalyada of Altoros presented preliminary results of our new research. The study evaluated new versions of major NoSQL DBs on infrastructure provided by our partner Lunacloud–thanks to them for their support! The final performance data will be released next month.

(more…)

No Comments

A Gaming Bots Constest at a RoR Meetup

Volha Kurylionak

On September 14, Altoros organized a Ruby on Rails meetup that took place in Minsk, Belarus (Eastern Europe). Attended by 80+ Ruby experts from the local dev community, the event featured several sessions on RoR, cloud, and other relevant topics.

During the meetup, the participants had an opportunity to test their programming skills in a contest of Codenjoy, an open source training gamification framework for developers.  For the event, our team created a Ruby connector to this service. Developers that took part in the contest could write bots that were playing Tetris, a popular PC game from the ’80s.

Watch the video to see how the bots competed against each other:

Ruby fluff talks, tea/coffee, and lots of informal networking—it was a nice, warm event. Check out the photos below.

(more…)

No Comments

Benchmarks and Research

Subscribe to new posts

Get new posts right in your inbox!