Mariadb – Why is disk IO higher on Debian 10 (MariaDB 10.3) with MySQL replication

debianmariadb-10.3master-master-replicationreplication

I have a MySQL/MariaDB master-master replication setup that has been working well for several years, the db and tables are not very large (under 200MB for 18 tables). These were on 2 servers running Debian 9 and MariaDB 10.1.44. Now I've spun up 2 new servers running Debian 10 and I'm in the process of moving things over to them, but stopped half-way because I'm seeing much higher disk IO usage on the new servers (about 6x more).

So currently, one of the Debian 9 servers and one of the Debian 10 servers are in master-master relationship, with one Debian 9 still being a slave of the master Debian 9 server, and same on the Debian 10 side of things.

I didn't notice the increased disk IO until after all read/write operations were moved to the Debian 10 master. I was trying to browse tables and saw how slow it was outputting the query results, and it felt like I was on a dial-up connection watching the rows scroll across. It turned out there was some disk contention with the virtual host that was partly responsible, and that problem is now mostly gone.

Now, as you can imagine, none of this is crashing the server with such a "small" set of tables, but as things continue to grow, I'm concerned that there is some underlying mis-configuration which will rear its ugly head at an inopportune time. On the Debian 9 servers, iotop shows steady write IO at around 300-600Kb/s, but on Debian 10 it spikes as high as 6MB/s, and averages around 3MB/s.

Here is the standard config on all 4 servers, everything else is default Debian settings (or MariaDB, as the case may be), full config for Debian 10 at https://pastebin.com/Lk2FR4e3:

max_connections = 1000
query_cache_limit       = 4M
query_cache_size        = 0
query_cache_type        = 0
server-id               = 1 # different for each server
log_bin                 = /var/log/mysql/mysql-bin.log
binlog_do_db            = optimizer
replicate-do-db         = optimizer
report-host             = xyz.example.com #changed obviously
log-slave-updates       = true
innodb_log_file_size    = 32M
innodb_buffer_pool_size = 256M

Here are some other settings I've tried that don't seem to make any difference (checked each one by one):

binlog_annotate_row_events = OFF
binlog_checksum = NONE
binlog_format = STATEMENT
innodb_flush_method = O_DIRECT_NO_FSYNC
innodb_log_checksums = OFF
log_slow_slave_statements = OFF
replicate_annotate_row_events = OFF

I've gone through all the settings here that have changed from MariaDB 10.1 to 10.3, and can't seem to find any that make a difference: https://mariadb.com/kb/en/replication-and-binary-log-system-variables/

I also did a full listing of the server variables and compared the configs on 10.1 to the 10.3 configuration and didn't find anything obvious. But either I'm missing something, or the problem lies with Debian 10 itself.

Results of SHOW ENGINE INNODB STATUS are here: https://pastebin.com/mJdLQv8k

Now, how about that disk IO, what is it actually doing? I include 3 screenshots here to show what I mean by increased disk IO:

That is from the Debian 10 master, and you can see where I moved operations back to the Debian 9 server (more on that in a second). Notice the disk IO does go down slightly at that point, but not to the levels that we'll see on the Debian 9 master. Also note that the public bandwidth chart is pretty much only replication traffic, and that the disk IO far outstrips the replication traffic. The private traffic is all the reads/writes from our application servers.

This is the Debian 9 master server, and you can see where I moved all operations back to this server, the private traffic shoots up, but the write IO hovers around 500kB/s. I didn't have resource graphs being recorded on the old servers, thus the missing bits on the left.

And lastly, for reference, here is the Debian 10 slave server (that will eventually be half of the master<–>master replication). There are no direct reads/writes on this server, all disk IO is from replication.

Just to see what would happen (as I alluded to above), I reverted all direct read/write operations to the Debian 9 master server. While disk IO did fall somewhat on the Debian 10 server, it did not grow on the Debian 9 server to any noticeable extent.

Also, on the Debian 10 slave server, I did STOP SLAVE once to see what happened, and the disk IO went to almost nothing. Doing the same on the Debian 10 master server did not have the same drastic effect, though it's possible there WAS some change that wasn't obvious; the disk IO numbers on iostat fluctuate much more wildly on the Debian 10 servers than they do on the Debian 9 servers.

UPDATE: after moving all read/write/update operations OFF the Debian 10 master, a STOP SLAVE command has the same exact effect as it did on the Debian 10 slave.

UPDATE 2: the more I look at this, the more I think it has nothing to do with replication. It seems that replication simply magnifies the effects of this problem.

So, what is going on here? How can I figure out why MariaDB is writing so much data to disk apparently and/or how can I stop it?

Thanks in advance!

Best Answer

This seems to be a performance regression (awaiting a fix), but may also be an issue if your innodb_io_capacity is too high. In my case, innodb_io_capacity is already 200, and lowering it further does not change anything. So I'll continue to await a fix in 10.5 (though perhaps they'll back-port it).

Related Solutions

Mysql – Best backup strategy for MySQL (MariaDB) replication set

Replication status and lag are vital monitors you should take care of. Before start a backup you must know if your slave goes well.

A simple show slave status will show you all needed infos:

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 172.30.40.61
                  Master_User: replica
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.021934
          Read_Master_Log_Pos: 205924047
               Relay_Log_File: relay-bin.004199
                Relay_Log_Pos: 205924192
        Relay_Master_Log_File: mysql-bin.021934
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB: 
          Replicate_Ignore_DB: 
           Replicate_Do_Table: 
       Replicate_Ignore_Table: 
      Replicate_Wild_Do_Table: 
  Replicate_Wild_Ignore_Table: 
                   Last_Errno: 0
                   Last_Error: 
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 205924047
              Relay_Log_Space: 205924384
              Until_Condition: None
               Until_Log_File: 
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File: 
           Master_SSL_CA_Path: 
              Master_SSL_Cert: 
            Master_SSL_Cipher: 
               Master_SSL_Key: 
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error: 
               Last_SQL_Errno: 0
               Last_SQL_Error: 
1 row in set (0.00 sec)

The mains "counters" are Slave_IO_Running and Slave_SQL_Running for replication status and Seconds_Behind_Master for lag (ideally at 0 second).

If you have a slave dedicated to backups (that is a good practice), I recommend you to make a binary copy of your datadir instead (or in addition) of your mysqldump. The restore will be much more easier and quick. However mysqldump is good if you want to restore a partial backup (especially InnoDB tables) or restore a clean shrinked dataset.

If your are afraid by corrumption or delta between Master and Slaves you can use the Percona tool pt-table-checksum (available in the Percona Toolkit) that "Verify MySQL replication integrity" easily.

Max.

Mysql – Transaction speed benchmarks for theSQL v5.6 replication – seems very slow

You should put everything on a level playing field. How ?

Without proper tuning, it is possible for older versions of MySQL to outrun and outgun new versions.

Sep 25, 2013 : Why should I use InnoDB and MySql instead of XtraDB and MariaDB?
Mar 26, 2012 : Percona vs MySQL
Nov 24, 2011 : Why mysql 5.5 slower than 5.1 (linux,using mysqlslap)
Oct 05, 2011 : Query runs a long time in some newer MySQL versions
Jun 19, 2011 : How do I properly perform a MySQL bake-off?

Before running SysBench on the three environments

Make sure all InnoDB settings are identical for all DB Servers
For the Master/Slave, run STOP SLAVE; on the Slave
For PXC (Percona XtraDB Cluster), shutdown two Masters

Compare the speeds of just standalone MySQL, Percona, and MariaDB.

ANALYSIS

If MySQL is best (Percona people, please don't throw rotten vegetables at me just yet. This is just conjecture), run START SLAVE;. Run SysBench on the Master/Slave. If the performance is significant slower, you may have to implement semisynchronous replication.

If PXC is best, you may need to tune the wsrep settings or the network itself.

If MariaDB is best, you could switch to MariaDB Cluster (if you have the Money) or setup Master/Slave with MariaDB. Run Sysbench. If the performance is significant slower, you may need to tune the wsrep settings or the network itself.

Why tune wsrep settings ? Keep in mind that Galera wsrep (WriteSet Replication) uses virtually synchronous commits and rollbacks. In other words, either all nodes commit or all nodes rollback. In this instance, the weakest link would have to be

how fast the communication between Nodes happens (especially true if the Nodes are in different data centers)
if any one node has underconfigured hardware settings
if any one node communicates slower than other node

Side Note : You should also make sure tune MySQL for multiple CPUs

Jun 01, 2012 : I've got 16GB of ram, how should I configure MySQL Server?
May 07, 2012 : MySQL Server Performance
Apr 26, 2012 : Is the CPU performance relevant for a database server?
Mar 16, 2012 : Using multiple cores for single MySQL queries on Debian
Oct 07, 2011 : Should I use a storage engine other than MyISAM to optimise these tables or should I get better disks?
Sep 20, 2011 : Multi cores and MySQL Performance
Sep 12, 2011 : Possible to make MySQL use more than one core?
May 26, 2011 : About single threaded versus multithreaded databases performance

UPDATE 2014-11-04 21:06 EST

Please keep in mind that Percona XtraDB Cluster does not write scale very well to begin with. Note what the Documentation says under its drawbacks (Second Drawback):

This can’t be used as an effective write scaling solution. There might be some improvements in write throughput when you run write traffic to 2 nodes vs all traffic to 1 node, but you can’t expect a lot. All writes still have to go on all nodes.

SUGGESTION #1

For PXC, turn off one node. Run SysBench against a two node cluster. If the write performance is better than a three node cluster, then it is obvious that the communication between the nodes is the bottleneck.

SUGGESTION #2

I noticed you have a 42GB Buffer Pool, which is more than half the server's RAM. You need to partition the buffer pool by setting innodb_buffer_pool_instances to 2 or more. Otherwise, you can expect some swapping.

SUGGESTION #3

Your innodb_log_buffer_size is 8M by default. Try making it 256M to increase log write performance.

SUGGESTION #4

Your innodb_log_file_size is 512M. Try making it 2G to increase log write performance. If you apply this setting, then set innodb_log_buffer_size to 512M.