MySQL crashes and currently the error.log file is huge

innodbMySQLoptimization

I've created a droplet on DigitalOcean created using Laravel Forge 6 months ago. Two weeks ago, we decided it was time to upgrade the droplet and we moved from a 4GB RAM/2CPUs to a 16GB RAM/6CPUs droplet and since a few days ago the MySQL server just crashes and the only way to make it work again is by rebooting the server (MySQL makes the server unresponsive).

When I type htop to see the list of processes is showing a few of /usr/sbin/mysqld --daemonize --pid-file=/run/mysqld/mysql.pid (currently is showing more than 30 entries like that).

The error log is bigger than 1GB (yes, I know!) and shows this message hundreds of times:

[Warning] InnoDB: Difficult to find free blocks in the buffer pool (21 search iterations)! 21 failed attempts to flush a page! Consider increasing the buffer pool size. It is also possible that in your Unix version fsync is very slow, or completely frozen inside the OS kernel. Then upgrading to a newer version of your operating system may help. Look at the number of fsyncs in diagnostic info below. Pending flushes (fsync) log: 0; buffer pool: 0. 167678974 OS file reads, 2271392 OS file writes, 758043 OS fsyncs. Starting InnoDB Monitor to print further diagnostics to the standard output.

The only thing that changed recently is now we send weekly notifications to customers (only the ones that subscribed to it) to let them know about certain events happening in the current week. This is kind of a intensive process, because we have a few thousands of customers, but we take advantage of Laravel Queues in order to process everything.

I've tried to change innodb_buffer_pool_size from the default value to 80% of available RAM (~13GB) and instead of the previous message, now it's showing:

"InnoDB: page_cleaner: 1000ms intended loop took 4228ms. The settings might not be optimal.".

And this change made the database run slower. For example, to process 30k records (the notifications thing that I mentioned) it took 6 hours but before the change it was taking around 3 (when it didn't crashed).

Is this a MySQL-settings related issue?

EDIT: Global Status and Variables after innodb_* suggested changes

Show Variables and Show Global Status

Best Answer

For the 4GB droplet, change the config: innodb_buffer_pool_size to 1500M and restart.

For the 16GB droplet, change the config:

innodb_buffer_pool_size = 12G
innodb_buffer_pool_instances = 12
innodb_page_cleaners = 12

Revised Analysis (After running more than a day)

Observations:

  • Version: 5.7.24-0ubuntu0.18.04.1-log
  • 16 GB of RAM
  • Uptime = 1d 02:59:16
  • You are not running on Windows.
  • Running 64-bit version
  • You appear to be running entirely (or mostly) InnoDB.

The More Important Issues:

A lot of table scans and many are big. This may be interfering with other InnoDB operations, hence indirectly stalling the page_cleaners.

Change to innodb_lru_scan_depth = 256 as a possible solution to the page_cleaner problem.

Details and other observations:

( innodb_lru_scan_depth ) = 1,024 -- "InnoDB: page_cleaner: 1000ms intended loop took ..." may be fixed by lowering lru_scan_depth

( Innodb_buffer_pool_pages_free / Innodb_buffer_pool_pages_total ) = 511,813 / 786384 = 65.1% -- Pct of buffer_pool currently not in use -- innodb_buffer_pool_size is bigger than necessary?

( Innodb_buffer_pool_bytes_data / innodb_buffer_pool_size ) = 4,456,398,848 / 12288M = 34.6% -- Percent of buffer pool taken up by data -- A small percent may indicate that the buffer_pool is unnecessarily big.

( innodb_print_all_deadlocks ) = innodb_print_all_deadlocks = OFF -- Whether to log all Deadlocks. -- If you are plagued with Deadlocks, turn this on. Caution: If you have lots of deadlocks, this may write a lot to disk.

( join_buffer_size / _ram ) = 262,144 / 16384M = 0.00% -- 0-N per thread. May speed up JOINs (better to fix queries/indexes) (all engines) Used for index scan, range index scan, full table scan, each full JOIN, etc. -- If large, decrease join_buffer_size to avoid memory pressure. Suggest less than 1% of RAM. If small, increase to 0.01% of RAM to improve some queries.

( local_infile ) = local_infile = ON -- local_infile = ON is a potential security issue

( Handler_read_rnd_next / Com_select ) = 10,900,684,560 / 1418310 = 7,685 -- Avg rows scanned per SELECT. (approx) -- Consider raising read_buffer_size (128K now; unclear whether raising it will help)

( Select_scan ) = 233,714 / 97156 = 2.4 /sec -- full table scans -- Add indexes / optimize queries (unless they are tiny tables)

( Select_scan / Com_select ) = 233,714 / 1418310 = 16.5% -- % of selects doing full table scan. (May be fooled by Stored Routines.) -- Add indexes / optimize queries

( Connections ) = 131,201 / 97156 = 1.4 /sec -- Connections -- Increase wait_timeout; use pooling?

Abnormally small:

Innodb_dblwr_pages_written / Innodb_dblwr_writes = 2.31

Abnormally large:

Com_show_plugins = 0.26 /HR
Com_show_privileges = 0.037 /HR
Com_stmt_close = 21 /sec
Com_stmt_execute = 21 /sec
Com_stmt_prepare = 21 /sec
Innodb_buffer_pool_pages_free = 511,813
Performance_schema_file_instances_lost = 9
innodb_page_cleaners = 12
performance_schema_max_file_classes = 80
performance_schema_max_mutex_classes = 210

Abnormal strings:

innodb_fast_shutdown = 1
innodb_large_prefix = ON
log_slow_admin_statements = ON