When building a Hadoop-based recommendation engine for one of our customers some time ago, I presented the results the Apriori method generates with association rules. In my new study, I compare this approach with another one, clustering. The latter succeeded in building more relevant recommendations and providing more options for users.
The new document contains:
- a comparative table of the Apriori algorithm (for building association rules) vs. the k-means method (for clustering)
- 12 diagrams that feature real-life recommendations produced by both Apriori and k-means
- tips on efficient data pre-processing, including 3 ways to decrease size of data and computing time
- 3 methods to improve the quality of recommendations (based on association rules) and advice on how to get the most relevant recommendations (with k-means)
Download this white paper and feel free to send me your feedback afterwards.
Here is an overview of the most significant news from the PaaS ecosystem for November 2013.
- Pivotal Released the Pivotal One PaaS Based on Cloud Foundry
- Canonical to Launch an Integrated OpenStack-hosted PaaS
- Verizon to Integrate Cloud Foundry PaaS with Verizon Cloud
- Red Hat Halves the Prices for OpenShift Online’s Gears
- Amazon Launches Virtual Desktops and Adds Postgre Support to RDS
- Heroku1 Enables Salesforce.com-to-Heroku Synchronization
- Windows Azure: Three New Services Are Generally Available Now
- Google’s App Engine 1.8.8 Includes Dedicated Memcache
Continue reading this post…
Cloudera, Hortonworks, and MapR are the most popular Hadoop distributions available today. However, even with this short list, there are few unbiased comparisons of their cluster performance.
We’ve prepared a 65-page research paper that contains a vendor-independent overview of Cloudera, Hortonworks, and MapR distributions. This document provides 83 diagrams that explore performance under 7 types of workloads. Download your copy to learn:
- detailed performance results for 4-, 8-, 12-, and 16-node clusters
- how the size of a cluster affects data processing speed
- how different clusters behave under CPU and disk-bound workloads (including Bayes, DFSIO, Hive aggregation, PageRank, Sort, TeraSort, and WordCount)
- what issues slow down deployment and how to maximize Hadoop processing speed
Paul Maritz and company just launched Pivotal One and Pivotal CF, and reminded me yet again what a great platform Pivotal is about to become. Earlier today in a Twitter chat led by John Furrier of SiliconAngle, we discussed whether Pivotal One is vapor–or real. This post expands my opinion on the subject.
Pivotal‘s promise is that using their tool set, an IT architect can fill the entire “meat section” between raw virtual machines and an application. All in one shot. Indeed, at Altoros, we see more and more customer deployments involving every piece of the pie involving NoSQL/NewSQL and Hadoop data stores, real-time and analytics engines, messaging and apps deployed and scaled with the help of a PaaS layer.
To the naked eye, Pivotal’s offering makes a lot of sense, as it brings a one-stop solution that addresses quite a lot of the needs of next-generation application architecture at an average enterprise IT shop.
On March 13, 2013, when the Pivotal Initiative was announced, a high bar was set for the company. That is, to achieve $1B in revenue in 5 years. I believe that a few things should come together for this to happen.
If Pivotal can solve two key challenges–making a quantum leap in market leadership for a few more of their products and integrating the entire product suite into a single platform–they will probably not only achieve $1B in revenue in 5 years, but will have an amazing shot at becoming the bellwether of enterprise software moving ahead.
Challenge #1 – “best of breed incumbents”
Competing products are quickly becoming “best of breed” incumbents in five categories of next generation enterprise software where Pivotal is playing:
- Massively Parallel RDBMS, with focus on analytics
- Next generation databases
Continue reading this post…
Here is a brief overview of the main PaaS news for October 2013.
- The Cloud Foundry Community Advisory Board Kicked-off
- Pivotal Released Cloud Foundry Plug-ins for Maven and Gradle
- The Solum Project from Rackspace to Increase Productivity of OpenStack Developers
- The Progress Pacific Platform: New Functionality
- OutSystems Launches Public Cloud-based PaaS for Building .NET and Java Apps
- OpenStack 2013.2 (Havana) Supports Docker Containers
- dotCloud Changed Its Name to Docker, Inc.
- Clever Cloud Announced Support for the Go Language
- Heroku: Extended Validation SSL Certificates and Public Beta Availability of WebSockets
- Updates to OpenShift Online and OpenShift Origin
Continue reading this post…
Data analytics for a large online store involves a number of challenges. Product data may be complex by nature and reach terabytes in size, your data stores may be (geo-) distributed, association algorithms may require significant memory resources, etc.
One of our customers needed a recommendation engine for a media streaming service to increase sales. My task was to develop a model that would provide relevant movie suggestions to users. Due to the extremely large size of data, the customer wanted to avoid using clustering, which groups data based on purchasing history. The decision was to go with the Apriori algorithm that builds association rules based on frequent sequences found in transactions. However, when working with real data, we stumbled upon some limitations.
In my most recent research, “Using the Apriori Algorithm for a Movie Recommendation Engine,” I came up with:
- an overview of 4 most popular data processing algorithms for building association rules
- 3 ways to speed up processing and decrease data size when working with big data
- 3 methods that can improve the quality of search recommendations based on association rules
- pros and cons of implementing the Apriori algorithm for building association rules
- 10 diagrams that illustrate the theory and our findings
Download the white paper to learn more about the Apriori algorithm and what other options (such as clustering) you may have for building a recommendation engine. (Note: The document will be updated with more findings within a month.)
As a speaker at Cloud Expo 2013, one of the largest cloud computing events, Altoros has an opportunity to give away a number of free passes. To get one, meet our team at the conference by scheduling a brief appointment in the comments below.
Or, submit a request to attend the session by Manuel Garcia, “Build Your Own Private PaaS with Cloud Foundry” (4:00 – 4:45 pm, Nov 7, Thu). Manuel, our Director of Operations Argentina, will explain how to set up a Platform-as-a-Service based on Cloud Foundry, using a real-life case as an example. The presentation will cover the key concepts of PaaS, as well as deployment challenges and possible solutions to them.
See you at the Santa Clara Convention Center, Nov 4-7.
On Oct 9–10, the IT Messe conference attracted 40+ exhibitors and ~500 attendees to the Horsens city. At the event, Kim Jonassen, our Managing Director Denmark, spoke on the state of big data in Denmark. He overviewed the type of systems and companies that struggle with big data and also explained how distributed processing, NoSQL data stores, and other tools address these issues.
Continue reading this post…
Together with NephoScale, an IaaS provider and our partner, we participated in DataWeek 2013 from Sep 28 to Oct 3. The conference was held in San Francisco and was attended by 2,500+ data experts. It was a nice event with innovation awards and discussions on APIs, real-time data, and middleware. The conference was followed by Rackspace’s afterparty and beer tasting from StrikeIron and NephoScale. Special thanks to the NephoScale team for being with us there–and for the beer, of course. =)
Our next event on agenda is Big Data TechCon 2013 in San Francisco–meet us there tomorrow and on Oct 17.
Earlier this year, we released the Cloud Foundry Vagrant Installer, a tool that enables developers to run a self-contained partial Cloud Foundry v2 installation inside of a Vagrant virtual machine. Since then, we have been working on a number of updates.
One of the latest things I have added is support for custom buildpacks.
Who can benefit?
- Developers of custom buildpacks
- Developers of an application that requires a particular non-standard buildpack
- Anyone who wants to know if a Buildpack X would work onCloud Foundry
- Software vendors who want to get their products (in the form of buildpacks) to more users
What are the use cases?
- Developing and testing buildpacks and applications based on custom buildbacks
- Training: Getting anyone new to Cloud Foundry to try it out with any buildpack
- Marketing: Distribute your custom buildpacks (for example, IBM Liberty) as a one-click installer
You can download the Cloud Foundry Vagrant Installer here. Or, read more about custom CF buildpacks support.