The List of Featured Graph Database Overviews and Benchmarks

by Eugene LahanskyJune 10, 2013

The research papers provide tips on how to model the behavior of a graph database, as well as to detect potential issues before implementing it.

Graph data stores provide index-free adjacency resulting in much better performance if compared to the relational database management systems (RDBMS). Naturally, performance is the main concern for those who work with such databases. To predict the behavior of a graph database and find potential issues before actually implementing it, developers refer to research papers that help to simulate the workloads of the future system. This post shares some useful overviews of graph databases and benchmarks.

Table of Contents

Useful publications

There are multiple resources that describe the basics of graph databases.

Survey of Graph Database Models. This paper generalizes the research conducted in the field of graph database modeling. Concentrating on data structures, query languages, and integrity constraints, the authors compare graph database models against network, relational, semantic, object-oriented, and other influential database models. In addition, the paper provides information on levels of abstraction, base data structure, information focus, and many other characteristics of modern graph databases.

Graph Databases. This book published by O’Reilly Media discusses how graph databases can help you to manage and query highly connected data. Through examples, you will learn how to design and implement a graph database, discover alternative methods of storing data, such as relational and NoSQL databases, as well as learn the difference between these data storage models.

The Current State of Graph Databases. This research gives an overall summary of the current state of graph databases. Enumerating different categories, algorithms, and paradigms, the authors describe the graph database models in use today.

Graph DB benchmarks

Below, you will find a collection of up-to-date comparisons featuring valid performance results. These research papers might be useful when choosing the best option for your application.

Neo4j vs. Sparksee vs. OrientDB vs. a native RDF repository vs. SGDB

This publication presents the results of a comparative test run against Neo4j, Sparksee, OridntDB, a native RDF repository, and SGDB on a low-end machine with a two-core 2.4 GHz Intel processor and 2 GB of RAM. Data sets ranged in size from 1K to 1M. The workload included operations, such as the insertion of elements, local traversals, and global traversals. The preliminary results of the tests revealed issues with loading larger datasets into graph databases. In addition, poor overall performance was typical when the databases performed global traversal operations on larger networks. However, the performance was stable for local traversals with 2–3 hops.

Sparksee

This benchmark was designed to test scalability and performance of Sparksee (formerly known as DEX) for applications with large data sets. The authors tracked how many nodes and edges could be created by the database, the resulting size of the database, the time it took to load the database, and how many traversals could be made per unit of time. According to the results of this test, Sparksee does provide sufficient loading and querying speed to deal with large datasets. In addition to its great performance when dealing with billions of objects, the database uses only 36 bytes per object available in Sparksee.

Neo4j vs. MySQL

This publication compares MySQL against Neo4j to find out which one is more suitable for a data provenance system. In addition to structural query results for MySQL and Neo4j, the comparison takes into account database size and required disc space. Neo4j demonstrated somewhat better performance than MySQL when processing most query types. The graph database was much faster (sometimes exceeding the performance of the relational database by a factor of 10) when processing traversal queries. Neo4j required up to two times the amount of space used by MySQL, which required larger disk space in a single test out of 12.

AllegroGraph vs. Sparksee vs. HypergraphDB vs. InfiniteGraph vs. Neo4j vs. Sones

This paper provides a comparison of current graph database models, including general features for data storage and querying, data modeling features (data structures, query languages, and integrity constraints), and support for essential graph queries. The bottom line is that the current graph database models still need to mature. In particular, the ecosystem should define standard graph database languages for defining, manipulating, and querying data and notions of integrity constraints to preserve the consistency of the database.

These benchmarks demonstrate how graph databases are more effective than relational ones. In addition, there are a lot of different approaches to prepare and perform tests. Some of the mentioned sources present such approaches and algorithms. If you would like to share the latest benchmarks of graph databases, feel free to let us know in the comments.