One of the main requirements for modern information systems is the high data processing rate. Among the solutions to solve this problem the popular one is to use high-performance databases. This article will review and compare two popular databases in performance terms: Scylla and Cassandra:
These databases use the same structure, which allows for easier migration from one database to another. The main difference between them is that Scylla is written in C++ when Cassandra is in Java.
So Scylla has the following performance advantages:
Therefore, Scylla should, in theory, be a higher performance database than Cassandra. But theoretical results may differ from the practical ones, due to specificity of the data and the queries. So let's perform comparison benchmarks with access to data in Scylla and Cassandra databases.
The performance benchmarking process for two databases follows the next principles:
Benchmarking processes represent an iterative procedure: cyclic execution process for pre-defined algorithms – usually the simplest ones. In our case, these algorithms include the writing and reading data procedures for each database: Scylla and Cassandra. The general benchmark processes structure is shown below.
Also, it should be noted that all benchmarks data operations will be performed with the same data. The results will be presented in the tables and diagrams form for better analysis and comparison process.
Let's leave out the technical details of the benchmarking process and just present it's results.
Write test includes the data writing into the database and measuring a number of parameters that describe its performance. These parameters are grouped according to the described data processes and have numerical values:
We’ll build diagrams based on parameters that generally describe the database's performance. Diagrams are shown as a bar plot for each group of parameters.
The first diagram describes the latency parameters. As we can see, the average latency values are approximately equal for all databases, but the maximum ones are higher for Cassandra. The maximum latency values are valid for a small data amount and therefore cannot generally describe the database's performance. That means that the database's performances are approximately the same by this parameter.
The remaining diagrams describe operation rate parameters and their total time measuring. As we can see, the performance of the Scylla database by these parameters is more than 6 times greater than that of Cassandra databases.
In general, we can conclude that the data writing performance of the Scylla database is an order of magnitude larger. For a more complete analysis of database performance, let's perform a similar test for writing/reading data procedures.
Write/read test includes all the same data processes as in the previous test. The only difference is only for simple operation of the test - it includes writing data and their following reading. For this reason, let's present only the test results in the table and diagrams forms, and then analyze them.
Analyzing the results we can conclude that the distribution patterns of the writing/reading test results match the previous test: latency performance is approximately the same and operation rate and total time are approximately 2.5 times higher. Therefore, we can say that the writing/reading procedures performance for the Scylla database is higher than for Cassandra.
In this article, we compared the performance of two popular Scylla and Cassandra databases using the benchmark processes. The tests were performed for data writing and writing/reading procedures, based on the required parameters' groups, and showed that in general Scylla is faster than Cassandra.