Mysql – thesql 5.6 flushing all dirty pages on ib_logfile rotation creating checkpoint stall

innodbmysql-5.6

Server version: 5.6.23-72.1-log Percona

I have been trying to track down why our mysql server is sharp checkpointing and it seems to correlate with rotating of the ib_logfile.

I inherited this Percona 5.6 mysql server and it was pausing for the sharp checkpoint / full dirty flush once a day with innodb_log_file_size of 1GB. I thought perhaps it was insufficient logfile size and bumped it up to 25GB. Now we just get the same behavior less frequently. It was a sort of mystery to me until I was able to verify it happened when the logfile switched.

Mysql 5.5 docs: https://dev.mysql.com/doc/refman/5.5/en/innodb-checkpoints.html: "…when InnoDB starts to reuse a log file, it has to make sure that the database page images on disk contain the modifications logged in the log file that InnoDB is going to reuse. In other words, InnoDB must create a checkpoint and this often involves flushing of modified database pages to disk."

Mysql 5.6 docs omit this bit and say all it should do is fuzzy checkpointing in normal circumstances. https://dev.mysql.com/doc/refman/5.6/en/innodb-checkpoints.html: "InnoDB implements a checkpoint mechanism known as fuzzy checkpointing. InnoDB flushes modified database pages from the buffer pool in small batches. There is no need to flush the buffer pool in one single batch, which would disrupt processing of user SQL statements during the checkpointing process."

Is there any way to minimize or get out of this behavior if it is unexpected?

| innodb_adaptive_flushing | ON |
| innodb_adaptive_flushing_lwm | 10 |
| innodb_buffer_pool_size | 46170898432 |
| innodb_change_buffer_max_size | 25 |
| innodb_change_buffering | inserts |
| innodb_checksum_algorithm | innodb |
| innodb_checksums | ON |
| innodb_cleaner_lsn_age_factor | high_checkpoint |
| innodb_doublewrite | ON |
| innodb_empty_free_list_algorithm | backoff |
| innodb_file_format | Barracuda |
| innodb_file_per_table | ON |
| innodb_flush_log_at_timeout | 1 |
| innodb_flush_log_at_trx_commit | 1 |
| innodb_flush_method | O_DIRECT |
| innodb_flush_neighbors | 0 |
| innodb_flushing_avg_loops | 30 |
| innodb_force_load_corrupted | OFF |
| innodb_force_recovery | 0 |
| innodb_foreground_preflush | exponential_backoff |
| innodb_io_capacity | 3000 |
| innodb_io_capacity_max | 6000 |
| innodb_log_arch_dir | ./ |
| innodb_log_arch_expire_sec | 0 |
| innodb_log_archive | OFF |
| innodb_log_block_size | 512 |
| innodb_log_buffer_size | 8388608 |
| innodb_log_checksum_algorithm | innodb |
| innodb_log_compressed_pages | ON |
| innodb_log_file_size | 26843545600 |
| innodb_log_files_in_group | 2 |
| innodb_log_group_home_dir | ./ |
| innodb_lru_scan_depth | 2048 |
| innodb_max_changed_pages | 1000000 |
| innodb_max_dirty_pages_pct | 50 |
| innodb_max_dirty_pages_pct_lwm | 0 |
| innodb_max_purge_lag | 200 |
| innodb_max_purge_lag_delay | 0 |
| innodb_old_blocks_pct | 37 |
| innodb_old_blocks_time | 1000 |
| innodb_online_alter_log_max_size | 134217728 |
| innodb_open_files | 384 |
| innodb_page_size | 16384 |
| innodb_purge_batch_size | 20 |
| innodb_purge_threads | 1 |
| innodb_random_read_ahead | OFF |
| innodb_read_ahead_threshold | 56 |
| innodb_read_io_threads | 4 |
| innodb_read_only | OFF |
| innodb_rollback_on_timeout | OFF |
| innodb_rollback_segments | 128 |
| innodb_sched_priority_cleaner | 19 |
| innodb_sort_buffer_size | 1048576 |
| innodb_spin_wait_delay | 6 |
| innodb_sync_spin_loops | 30 |
| innodb_table_locks | ON |
| innodb_thread_concurrency | 0 |
| innodb_thread_sleep_delay | 10000 |
| innodb_use_global_flush_log_at_trx_commit | ON |
| innodb_use_native_aio | ON |
| innodb_use_sys_malloc | ON |
| innodb_version | 5.6.23-72.1 |
| innodb_write_io_threads | 8 |

Best Answer

The sentence about requiring a checkpoint before the log file rotation/reuse may have been removed from documentation, but the logic hasn't changed in version 5.6 -- you cannot overwrite log records that belong to not-yet-persisted data, so you have to checkpoint still. The fact that it is happening in your environment indicates that flushing of dirty pages with fuzzy checkpoints is not aggressive enough for your workload.

I'd try raising the innodb_adaptive_flushing_lwm threshold and lowering innodb_max_dirty_pages_pct to make page cleaning more proactive.

There are more hints and links in this article