Blog on All Things Cloud Foundry

MADlib, a Solution for Big Data Analytics from Pivotal

Sofia Parfenovich

General overview

There are a number of data analytics solutions that support the MapReduce principle and able to work with NoSQL databases. However, most enterprises still rely on mature SQL data stores and, therefore, need traditional analytics solutions to provide in-depth analysis of their business-critical data.

MADlib is a scalable in-database analytics library that features sophisticated mathematical algorithms for SQL-based systems. MADlib was developed jointly by researchers from UC Berkeley and engineers from Pivotal (formerly EMC/Greenplum). It can be considered as an enterprise alternative to Hadoop in machine learning, data mining, and statistics tasks. In addition, MADlib supports time series rows, which could not be processed appropriately by Hadoop, greatly extending capabilities for building prediction systems. (For more information, watch a video overview from Pivotal, read this introduction to MADlib, or visit the product page.)

Since I already had some experience in Wolfram Mathematica, I was tempted to compare the two products. The presentation that claimed MADlib’s high performance and great scalability of the built-in machine learning algorithms even boosted my curiosity. Below is one of the slides taken from this document. The solution is supposed to process billions of rows in minutes, impressive math!



Building Stock Trading Strategies: 20% Faster with Hadoop

Sofia Parfenovich

Based on complex mathematical algorithms, automated stock trading solutions take into account hundreds of factors and suggest the right time for placing buy/sell orders. Some of the systems like that can even make a deal without any human involvement. However, if an algorithm omits essential market parameters, this may bring significant loss.

In my guest post for Hortonworks, I shared a real-life example of how Hadoop and data clustering speeded up stock trading system’s performance by 20% and increased a customer’s revenues by 12%. You will learn how data clustering helped to diversify sell/buy strategies and how the right infrastructure improved the system’s performance without additional investments.


Benchmarks and Research

Subscribe to new posts

Get new posts right in your inbox!