Mysql – Identifying cause of high I/O

mysql-5.5percona-serverperformanceperformance-tuning

I'm trying to diagnose the bottleneck for high-write I/O activity on a percona master (with replication to a slave). The pattern of behaviour is roughly cyclic and regular: 60 mins of 75-90% write-based IO utilisation followed by 10 mins of ~25%. Occasionally this spikes to such a point that the fronting webservice is unresponsive for a couple of minutes.

Doing a SHOW PROCESSLIST shows the replication process and then a number of idle threads (though occasionally there are 1-2 COMMITs which complete by the time I reissue the process list command) but nothing indicating a specific active query.

Innodb buffer status variables:
mysql> SHOW GLOBAL STATUS LIKE 'Innodb_buffer_pool_pages_%'; +-----------------------------------------+-----------+ | Variable_name | Value | +-----------------------------------------+-----------+ | Innodb_buffer_pool_pages_data | 1678714 | | Innodb_buffer_pool_pages_dirty | 718 | | Innodb_buffer_pool_pages_flushed | 195847729 | | Innodb_buffer_pool_pages_LRU_flushed | 0 | | Innodb_buffer_pool_pages_free | 5299854 | | Innodb_buffer_pool_pages_made_not_young | 0 | | Innodb_buffer_pool_pages_made_young | 59369 | | Innodb_buffer_pool_pages_misc | 99319 | | Innodb_buffer_pool_pages_old | 619661 | | Innodb_buffer_pool_pages_total | 7077887 | +-----------------------------------------+-----------+ 10 rows in set (0.00 sec)

SHOW ENGINE INNODB STATUS
givesBuffer pool hit rate 1000 / 1000, young-making rate 0 / 1000 not 0 / 1000.

The mysqld section from my.cnf is

[mysqld]

user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/run/mysqld/mysqld.sock
pid-file                       = /var/run/mysqld/mysqld.pid

key-buffer-size                = 32M
myisam-recover                 = FORCE,BACKUP

max-allowed-packet             = 16M
max-connect-errors             = 1000000
innodb                         = FORCE

datadir                        = /srv/mysql/

server_id                      = 30
log-bin                        = /srv/mysql/mysql-bin
expire-logs-days               = 14
sync-binlog                    = 1 

tmp-table-size                 = 32M
max-heap-table-size            = 32M
query-cache-type               = 0 
query-cache-size               = 0 
max-connections                = 500
thread-cache-size              = 50
open-files-limit               = 65535
table-definition-cache         = 1024
table-open-cache               = 2048
tmpdir                         = /tmp

innodb-flush-method            = O_DIRECT
innodb-log-files-in-group      = 2 
innodb-log-file-size           = 512M
innodb-flush-log-at-trx-commit = 1
innodb-file-per-table          = 1 
innodb-buffer-pool-size        = 108G

log-error                      = /var/log/mysql/mysql-error.log
log-queries-not-using-indexes  = 1
slow-query-log                 = 1 
slow-query-log-file            = /var/log/mysql/mysql-slow.log

The slow query log has a number of entries but none of them are long-lived. A sample extract is:
use xxxxxxxxx; SET timestamp=1411484104; SELECT amnews_article.aid, amnews_article.sid_id, amnews_article.fid_id, amnews_article.url, amnews_article.url_hash, amnews_article.title, amnews_article.sha1, amnews_article.date_ts, amnews_article.summary, amnews_article.author_feed, amnews_article.author, amnews_article.language, amnews_article.fetched, amnews_article.processed, amnews_article.ngrams, amnews_article.indexed_ngram, amnews_article.topics_ngram, amnews_article.journals, amnews_article.persons, amnews_article.links, amnews_article.dois, amnews_article.verified, amnews_article.journals_version, amnews_article.duplicate_of_id, amnews_article.external, amnews_article.blog, amnews_article.last_updated, amnews_article.grouped FROM amnews_article ORDER BY amnews_article.aid DESC LIMIT 1;

As an aside I'm not sure why that query makes it into the slow query log because it runs quickly and uses indexes, as shown by an EXPLAIN

+----+-------------+----------------+-------+---------------+---------+---------+------+------+-------+        
| id | select_type | table          | type  | possible_keys | key     | key_len | ref  | rows | Extra |
+----+-------------+----------------+-------+---------------+---------+---------+------+------+-------+        
|  1 | SIMPLE      | amnews_article | index | NULL          | PRIMARY | 4       | NULL |    1 |       |        
+----+-------------+----------------+-------+---------------+---------+---------+------+------+-------+

Underlying, the disks are spinning platters configured as an ext4 LVM partition on RAID1 and the machine is physical (albeit managed) rather than a VM.
The data partition is on the same disks (LVM volume group) as the log partition. RAM usage is ~90% of 128G and swap is available but not being used.

$ free -m
             total       used       free     shared    buffers     cached
Mem:        128912     117370      11542          0        516      36092
-/+ buffers/cache:      80761      48151
Swap:         2045          0       2045

Version is 5.5.36-34.2-648.precise-log

Edit to add info asked for by Vérace:

Is your web server on the same box?
The webserver is on a different box from the db.

Are there cron jobs?
There are db-related cronjobs but they run from various different hosts. (But with queries not showing in the process list I don't know where to direct attention).

How are you determing that there's an I/O bottleneck?
We use newrelic for server and service monitoring. It produces graphs that show the long periods of high IO usage with the flatlining at 100MB/s. iotop -ao over ~10 mins show that the processes responsible are mysql. (Well, jdb is the very top but presumably that's expected with the disk setup)

My questions are twofold:
What is mysql doing writing so much data if nothing is shown in the process list (i.e. how can I find out which queries are responsible)?
Are there any configuration changes I can make to alleviate the almost-perpetual high IO?

Best Answer

This kind of sounds like poor flushing behavior, ideally you'd be seeing consistent IO not stretches of 90 and 25.

What is the adaptive flushing method in use (or is it off)? keep_average?
Is flushing neighbor pages enabled? Sometimes this helps, sometimes this really isn't efficient to do. Generally, its good on spinning disks so probably not the issue here
I would say the buffer pool is too large if its 108G / 128G, but I assume this was benchmarked at some level
What are the dirty pages?
innodb_write_io_threads?
innodb_read_io_threads?
innodb_io_capacity?

Might be a big help to do this:

innodb_stats_on_metadata = 0

http://www.percona.com/blog/2013/12/03/innodb_stats_on_metadata-slow-queries-information_schema/

Related Solutions

Thesql wont start after increasing innodb_buffer_pool_size and innodb_log_file_size

The two answers given from @RickJames and @drogart are essentially the remedies. (+1 for each).

Right from the error log you present, the last two lines say:

InnoDB: Error: log file ./ib_logfile0 is of different size 0 134217728 bytes

InnoDB: than specified in the .cnf file 0 268435456 bytes! `

At that point, it was evident that you set innodb_log_file_size to 256M (268435456) in my.cnf while the InnoDB Transaction Logs (ib_logfile0,ib_logfile1) were respectively 128M (134217728) each. Looking back at the link to my StackOverflow answer in your question, you had to do the following:

Step 01) Add this to my.cnf:

[mysqld]
innodb_buffer_pool_size=4G
innodb_log_file_size=1G

Step 02) Run these command in the OS

mysql -u... -p... -e"SET GLOBAL innodb_fast_shutdown = 1"
service mysql stop
rm -f /var/lib/mysql/ib_logfile*
service mysql start

So as to have confidence in what is happening, run tail -f against the error log. You will see message telling you when each innodb log file is being created.

Mysql – Hyperthreading & MySQL InnoDB Thread Concurrency Performance

By default, innodb_thread_concurrency is 0. That's actually the best setting. It means infinite concurrency. It allows the InnoDB storage engine to decide the best number of concurrency tickets to launch and address. Setting it to a nonzero value actually can throttle InnoDB or throttle the OS if not set properly.

I have written many posts in the DBA StackExchange about this setting:

May 26, 2011 : About single threaded versus multithreaded databases performance
Aug 16, 2011 : Why do MySQL threads often show "freeing items" status when the query cache is disabled?
Sep 12, 2011 : Possible to make MySQL use more than one core?
Sep 20, 2011 : Multi cores and MySQL Performance
Apr 26, 2012 : Is the CPU performance relevant for a database server?
Aug 03, 2012 : Slow insert performance in mysql innodb table for solr indexing

Best Answer

Related Solutions

Thesql wont start after increasing innodb_buffer_pool_size and innodb_log_file_size

Mysql – Hyperthreading & MySQL InnoDB Thread Concurrency Performance

Related Question