MySQL cluster write performance with Galera (MariaDB)

galeramariadbMySQL

I am currently evaluating the migration from our current database server to a three node cluster.

Our current server is a VM running MariaDB 10.1 with asynchronous replication to another server in order to perform backups of the database without any performance hits.

The read to write ratio is about 70/30, we are running an IoT platform with many sensors sending raw data that is written to the database before it is consolidated and presented to the end user.

Because of the high write to read ratio, I want to make sure that the cluster can perform at least as well as our current server on writes before migrating.

For this test, I started a VM on the Google Compute Engine with 16 vCPUS, 60GB of RAM and SSD local storage with 10,000 read IOPS and 15,000 write IOPS.

On this VM running Debian 8.3 I installed MariaDB 10.1 (which includes the Galera cluster), in order to run some benchmarks with Sysbench 0.5.

These are the MySQL parameters that I changed from the default mysql.cnf file:

[mysqld]
binlog_format=ROW
innodb_autoinc_lock_mode=2
innodb_flush_log_at_trx_commit=0
innodb_log_file_size=2G
innodb_buffer_pool_size=4G
sync_binlog=0

When running the update test from sysbench, I am noticing a huge overhead from enabling wsrep on the single node that I have currently deployed.

This is the command that I am using to run the benchmark:

sysbench –db-driver=mysql –num-threads=32 –max-time=30 –max-requests=0 –oltp-tables-count=32 –oltp-table-size=100000 –oltp-test-mode=complex –test=/root/sysbench/sysbench/tests/db/update_index.lua –mysql-db=sbtest –oltp-skip-trx=on run

With wsrep_on=OFF, I get around 54000 updates per second on the benchmark.

Setting wsrep_on to ON, I get around 19000 updates per second.

I was expecting a performance hit from enabling synchronous replication, but I did not expect it to be this much.

It doesn't look like there's a bottleneck from the CPU nor the IO, so why is the performance hit from enabling wsrep so high?

Best Answer

Data Warehousing (your app) and OLTP are somewhat different. So, I question the relevancy of the benchmark. OLTP, if I am not mistaken, involves very short 'transactions', which can suffer from the cross-node syncing done at COMMIT by Galera's wsrep. On the other hand, your insertion of new datapoints could be done in batches of, say, 1000, and not have to worry about the commit overhead.

With 60GB, innodb_buffer_pool_size should be more like 45G. And set innodb_buffer_pool_instances=16.

On the surface, 19K/sec (per node?) seems faster -- 19K*3 > 54K. But maybe I am misreading things.

Let's see your SHOW CREATE TABLE for the Fact table. There are a number of mistakes that can cause severe performance problems in your type of application.

When do you do the "consolidating"? And how? Summary table (which I assume is what you are talking about) is key to efficiency. Sometimes it is even better store the summary data, yet toss the raw data.

Check out my discussion of High speed ingestion and Summary tables. My point in those are that there are significant speedups that can (and should) be made aside from the underlying structure (hardware, Galera, etc).

Your Galera settings are in the right direction; there may be extra tips in Galera.

More questions (which will lead to more discussion): How many clients are pumping data into the table? How many rows are inserted at once? How easy would it be to gather the rows and batch-insert them? Is any Normalization occurring during the insertion? Does it make sense to do the summarization in parallel with the insertion? Did you intend to have clients writing to all 3 nodes? Do you have some form of Proxy between the clients and the nodes?