Mysql – DB randomly locks up when innodb_buffer_pool_size is set high

innodblockingMySQL

I have been struggling to work out what could be happening here but for many months, since upgrading to mysql 5.6 we seem to be experiencing random lag spikes in page loads which has been tracked back to mysql and the innodb_buffer_pool_size setting specifically.

The server is as follows;

CentOS 6 64bit cpanel/WHM
Dual Intel Xeon E5-2620 – Hex Core 2.0GHz, 15MB Cache, HyperThreaded [12 cores, 24 threads]
24GB RAM
2 x 240GB SSD
Apache 2.4
PHP 5.5 (fcgid handler)
mysql 5.6

We are running a Magento site that gets about 1,500 visitors a day with a DB size of about 5GB. Nothing major by any stretch.

Using Newrelic we can see that we can comfortably allocate 12GB of ram to innodb_buffer_pool_size. However, about 24 hours after doing this we start to get random load spikes where pages may take 10s of seconds to minutes to load, however, if you hit refresh in the middle of such a load spike the page will load in under a second as per normal.

This problem began especially apparent when we tried to use php-fpm as the php handler so as to properly utilise opcache. When we tried this the PHP processes would get backed up during a mysql lock and processes numbers would ramp up until the site crashed. It became so unstable that we had to revert back to using fcgid which just stalls rather than crashes.

Looking at Newrelic we can see the load time spikes in mysql however, the server memory and CPU usage is still no where near exhausted.

Oddly enough, leaving innodb_buffer_pool_size to defaults seemed to make the problem go away for at least a week or so. Once the mysql ram usage starts to hit the 900mb to 1gb range the spikes start to happen again and we have to restart mysql. Then we are good for another week or so.

With the default mysql settings Newrelic shows about 80% of the ram is free.

Below is our current my.cnf; (as you can see we have commented out the innodb buffer lines and all logging)

[mysql]

# CLIENT #
port                           = 3306
socket                         = /var/lib/mysql/mysql.sock

[mysqld]

# GENERAL #
user                           = mysql
default-storage-engine         = InnoDB
socket                         = /var/lib/mysql/mysql.sock

# MyISAM #
key_buffer_size                = 64M
myisam_recover_options         = FORCE,BACKUP

# SAFETY #
max_allowed_packet             = 16M
max_connect_errors             = 1000000
skip-name-resolve
innodb                         = FORCE

# DATA STORAGE #
datadir                        = /var/lib/mysql/

# BINARY LOGGING #
#log-bin                        = /var/lib/mysql/mysql-bin
#expire_logs_days               = 14
#sync_binlog                    = 1

# CACHES AND LIMITS #
wait_timeout               = 300
query_cache_type               = 0
query_cache_size               = 0
max_connections                = 500

# INNODB #
innodb_log_file_size           = 256M # if changing, stop database, remove old log files, then start!
innodb_file_per_table          = 1
#innodb_buffer_pool_size        = 12G
#innodb_buffer_pool_instances   = 12

# LOGGING #
#log-error                      = /var/lib/mysql/mysql-error.log
#log-queries-not-using-indexes  = 1
#slow-query-log                 = 1
#slow_query_log_file            = /var/lib/mysql/mysql-slow.log

We have gone to our hosting providers and asked them for help. Initially they told me to raise the innodb_buffer_pool_instances as they thought their may have been some lock contention. I set it from 12 to 24 and the problem persisted.

Then they ran a memtest to check for issues with the ram and found nothing. Eventually they gave up and told us to consult a DB expert.

I just can't work it out. Any help would be greatly appreciated.

************ UPDATE 1 *************

So I have been digging into this issue further and further and have made some new discoveries.

I un-commented the innodb_buffer_pool_size = 12G and began monitoring the output of SHOW PROCESSLIST. I noticed, during lag times, that queries, not any in particular, would be stuck in a "Writing to net" state for sometimes minutes at a time. Those same queries would execute in milliseconds from, cli or mysql workbench.

In order to try an ascertain more data about what was going on with the server at the time of these slow writing to net states I installed Percona tools and set up a pt-stalk daemon to monitor the server and trigger a collect whenever a query sat in a writing to net state for longer than 5 seconds.

Looking at the pt-stalk output files, what seemed very interesting was the output in the opentables1 and opentables2 files which always says…

2015_12_06_05_01_04 Too many open tables: 2135

The number of open tables varies, but seems to always be well above 1000. Initially I took this as an error message and raised the open files ulimit of the server which did not fix the problem. Then I found this bug report https://bugs.launchpad.net/percona-toolkit/+bug/1307377 which explains that this is not an error but a warning pt-stalk fires when there are more than 1000 open tables.

One other thing pt-stalk was able to tell me was the exact time the lags started to occur. I notice that the slow writing to net states always starts occurring as soon as Mysql hits the 4.8 to 5GB ram usage. 4.8G happens to be the size of the tmp disk partition cpanel has set up. Although this partition seems to have plenty of free space, only ~380mb being used at any time, my gut feeling tells me I should try and increase it to see if that helps.

That will be my next step and I will report the results back here.

If there is anything else that anybody thinks I should try given the above information please do share.

Best Answer

This issue was being caused by our outdated kernel (Linux 2.6.32-358.18.1.el6.x86_64) and was resolved by upgrading the Kernel to the latest version (kernel.x86_64 2.6.32-573.12.1.el6).

Related Solutions

Thesql wont start after increasing innodb_buffer_pool_size and innodb_log_file_size

The two answers given from @RickJames and @drogart are essentially the remedies. (+1 for each).

Right from the error log you present, the last two lines say:

InnoDB: Error: log file ./ib_logfile0 is of different size 0 134217728 bytes

InnoDB: than specified in the .cnf file 0 268435456 bytes! `

At that point, it was evident that you set innodb_log_file_size to 256M (268435456) in my.cnf while the InnoDB Transaction Logs (ib_logfile0,ib_logfile1) were respectively 128M (134217728) each. Looking back at the link to my StackOverflow answer in your question, you had to do the following:

Step 01) Add this to my.cnf:

[mysqld]
innodb_buffer_pool_size=4G
innodb_log_file_size=1G

Step 02) Run these command in the OS

mysql -u... -p... -e"SET GLOBAL innodb_fast_shutdown = 1"
service mysql stop
rm -f /var/lib/mysql/ib_logfile*
service mysql start

So as to have confidence in what is happening, run tail -f against the error log. You will see message telling you when each innodb log file is being created.

Thesqld_safe version different than thesqld

As we all know, mysqld_safe and mysqld are very different

mysqld : The database server instance daemon

mysqld_safe : Control program that examines and sets the environment for mysqld to execute. The mysqld executable is actually launched in a loop. When mysqld terminates, the mysqld_safe program will examine the return results and decide whether

mysqld terminated normally (intentional shutdown), leaves mysqld_safe
mysqld terminated abnormally (crash or kill -9 of mysqld)
- Loop back, mysqld fails on retry, leaves mysqld_safe
- Loop back, mysqld starts up, stays in the mysqld_safe loop

Why is it important to have mysqld and mysqld_safe using the same MySQL version?

Let me illustrate it this way: Percona Server sometimes has additional features in mysqld_safe for manipulating the OS. For example, I have seen numactl --interleave=all in a Percona Server mysqld_safe. If that line was not there, the mysqld for Percona Server may run into issues with memory and swapping.

The same scenario could possibly be the case for Oracle's (ugh, still hate saying that) mysqld and mysqld_safe. There could be improvements from one major release to another that would be removed if the mysqld_safe was older.

Rather than exploring the possibilities of using a old mysqld_safe and a new mysqld (or vica versa), please make your life simple and reinstall MySQL 5.5.30 from scratch.

Before doing so, please run

updatedb
locate mysqld_safe

in Linux and see if there are two lingering. If there are, get the paths straightened out. Otherwise, you may have to reinstall MySQL 5.5.30.

Best Answer

Related Solutions

Thesql wont start after increasing innodb_buffer_pool_size and innodb_log_file_size

Thesqld_safe version different than thesqld

Related Question