The List of Featured Graph Database Overviews and Benchmarks
Graph data stores provide index-free adjacency resulting in a much better performance, if compared to traditional RDBMS. Naturally, performance is the main concern for those who work with such databases. To predict the behavior of a graph database and find potential issues before actually implementing it, developers use benchmarks that simulate the actual workloads that users will create. This post covers some useful graph database overviews and benchmarks.
Graph DB overviews
There are plenty of resources and publications that describe the basics of graph databases.
- Survey of Graph Database Models: This paper generalizes the research conducted in the field of graph database modeling. Concentrating on data structures, query languages, and integrity constraints, the authors compare graph database models against network, relational, semantic, object-oriented, and other influential DB models. In addition, the paper provides information on levels of abstraction, base data structure, information focus, and many other characteristics of today’s graph databases.
- Graph Databases: This book published by O’Reilly discusses how graph databases can help you to manage and query highly connected data. Through examples, you will learn how to design and implement a graph database, discover alternative methods of storing data, such as relational and NoSQL databases, as well as learn the difference between these data storage models.
- The Current State of Graph Databases: This paper gives an overall summary of the current state of graph databases. Enumerating different categories, algorithms, and paradigms, the authors describe the graph database models in use today.
Graph DB benchmarks
Getting valid performance results is not easy. The good news is that there are a number graph database benchmarks available. Below you will find a collection of comparisons that have been published over the past couple of years. These might be useful when choosing the best option for your application.
1) Neo4j vs. DEX vs. OrientDB vs. a native RDF repository vs. SGDB
This publication presents the results of a comparative test run against Neo4j, DEX, OridntDB, a native RDF repository, and SGDB on a low end machine with a two-core 2.4 GHz Intel processor and 2 GB of RAM. Data sets ranged in size from 1 K to 1 M. The workload included operations, such as insertion of elements, local traversals, and global traversals. The preliminary results of the tests revealed issues with loading larger datasets into graph databases. In addition, poor overall performance was typical when the databases performed global traversal operations on larger networks. However, the performance was stable for local traversals with 2–3 hops.
Another benchmark was designed to test scalability and performance of DEX for applications with very large data sets. The authors tracked how many nodes and edges could be created by the database, the resulting size of the database, the time it took to load the database, and how many traversals could be made per unit of time. According to the results of this test, DEX does provide sufficient loading and querying speed to deal with large datasets. In addition to its great performance when dealing with billions of objects, the database only uses 36 bytes per object available in DEX.
3) Neo4j vs. MySQL
This publication compares MySQL against Neo4j to find out which one of them is more suitable for a data provenance system. In addition to structural query results for MySQL and Neo4j, the comparison takes into account database size and required disc space. Neo4j demonstrated somewhat better performance than MySQL when processing most query types. The graph database was much faster (sometimes exceeding the performance of the relational database by a factor of 10) when processing traversal queries. Neo4j required up to two times the amount of space used by MySQL, which required a larger disk space in only one test out of 12.
4) AllegroGraph vs. DEX vs. HypergraphDB vs. InfiniteGraph vs. Neo4j vs. Sones
This paper provides a comparison of current graph database models, including general features (for data storage and querying), data modeling features (data structures, query languages, and integrity constrains), and support for essential graph queries. The bottom line is that current graph DB models still need to mature. In particular, the ecosystem should define standard graph database languages (for defining, manipulating, and querying data) and notions of integrity constraints (to preserve the consistency of the database).
From these benchmarks, we can see how graph databases are more effective than relational ones. In addition, there are a lot of different approaches to prepare and perform tests. Some of the mentioned sources present such approaches and algorithms. If you know other latest benchmarks of graph databases, fell free to let us know.
Related posts:3 Comments