Mysql – How to improve thesql replication performance reading the relay logs in slave

MySQLperformancereplication

I have a mysql 5.5 row-based replication being done. The slave is slowly getting delayed relative to the master. On show slave status I see the distance between Master_Log_File and Relay_Master_Log_File increase, the slave can't keep up. I have 30G of pending logs, in Relay_Log_Space.

I read that binlog_row_image (which is not available in 5.5 but I could upgrade if needed) set to minimal would improve disk, network and memory, none of which seems to be my problem – I need faster processing of the relay logs. From what I read, large amounts of updates are the cause for my problem: I have more than 80 million updates per day. These updates happen mostly in one column of a table with PK (I read that not having a PK would cause3 poor performance, but that's covered).

The column being updated has an index itself. Would dropping this index help improve my performance? (I can dispose of it, if I really need to). Would upgrading mysql to a newer version make it possible to process the logs in parallel instead of sequentially? Is there anything else I could do to improve the processing of the relay log files (other than upgrading my hardware) ?

Update: I read that having heavy queries in the slave DB will slow down the replication, but the slave currently doesn't have any significant queries. It is queried several times throughout the day, but I'm closely monitoring it and no particular query is hogging it, and there isn't a big amount of queries either; actually the DB spends more than 50% of the time just idle.

Update 2: The slave has 32G RAM. See below the innodb variables:

mysql> SHOW VARIABLES LIKE 'innodb%';
+---------------------------------+------------------------+
| Variable_name                   | Value                  |
+---------------------------------+------------------------+
| innodb_adaptive_flushing        | ON                     |
| innodb_adaptive_hash_index      | ON                     |
| innodb_additional_mem_pool_size | 536870912              |
| innodb_autoextend_increment     | 8                      |
| innodb_autoinc_lock_mode        | 1                      |
| innodb_buffer_pool_instances    | 1                      |
| innodb_buffer_pool_size         | 21474836480            |
| innodb_change_buffering         | all                    |
| innodb_checksums                | ON                     |
| innodb_commit_concurrency       | 0                      |
| innodb_concurrency_tickets      | 500                    |
| innodb_data_file_path           | ibdata1:10M:autoextend |
| innodb_data_home_dir            |                        |
| innodb_doublewrite              | ON                     |
| innodb_fast_shutdown            | 1                      |
| innodb_file_format              | Antelope               |
| innodb_file_format_check        | ON                     |
| innodb_file_format_max          | Antelope               |
| innodb_file_per_table           | ON                     |
| innodb_flush_log_at_trx_commit  | 2                      |
| innodb_flush_method             |                        |
| innodb_force_recovery           | 0                      |
| innodb_io_capacity              | 200                    |
| innodb_lock_wait_timeout        | 50                     |
| innodb_locks_unsafe_for_binlog  | OFF                    |
| innodb_log_buffer_size          | 268435456              |
| innodb_log_file_size            | 367001600              |
| innodb_log_files_in_group       | 2                      |
| innodb_log_group_home_dir       | ./                     |
| innodb_max_dirty_pages_pct      | 75                     |
| innodb_max_purge_lag            | 0                      |
| innodb_mirrored_log_groups      | 1                      |
| innodb_old_blocks_pct           | 37                     |
| innodb_old_blocks_time          | 0                      |
| innodb_open_files               | 300                    |
| innodb_purge_batch_size         | 20                     |
| innodb_purge_threads            | 0                      |
| innodb_read_ahead_threshold     | 56                     |
| innodb_read_io_threads          | 4                      |
| innodb_replication_delay        | 0                      |
| innodb_rollback_on_timeout      | OFF                    |
| innodb_spin_wait_delay          | 6                      |
| innodb_stats_on_metadata        | ON                     |
| innodb_stats_sample_pages       | 8                      |
| innodb_strict_mode              | OFF                    |
| innodb_support_xa               | ON                     |
| innodb_sync_spin_loops          | 30                     |
| innodb_table_locks              | ON                     |
| innodb_thread_concurrency       | 0                      |
| innodb_thread_sleep_delay       | 10000                  |
| innodb_use_native_aio           | ON                     |
| innodb_use_sys_malloc           | ON                     |
| innodb_version                  | 1.1.4                  |
| innodb_write_io_threads         | 4                      |
+---------------------------------+------------------------+

See below the create table:

CREATE TABLE `Filecheck` (
  `FilecheckID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `FileID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `NameID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `LastTimeChecked` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `FileTypeID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `ExtraInfo` bigint(20) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`FilecheckID`),
  UNIQUE KEY `FileID` (`FileID`),
  KEY `Filecheck_FKIndex1` (`NameID`),
  KEY `LastTimeChecked` (`LastTimeChecked`,`FileID`),
  KEY `FileID_2` (`FileID`,`LastTimeChecked`),
  KEY `NameID` (`NameID`,`FileID`)
) ENGINE=InnoDB AUTO_INCREMENT=347831298 DEFAULT CHARSET=utf8

The column that is being heavily updated is LastTimeChecked.

Best Answer

  • Upgrade to 5.6 and use multi-threaded replication. But... This is limited to different threads for different databases.

  • Use Galera -- it does not have the tight coupling between threads and databases, so it does (in theory) a better job of parallel execution.

  • Read this for more thoughts.

Also, let's see some of the more naughty queries, plus associated SHOW CREATE TABLE. There may be some simple changes to speed up execution, thereby decrease the delays.

If you are using InnoDB, let's see SHOW VARIABLES LIKE 'innodb%' on the Slave, plus how much RAM does the Slave have?

Are you doing huge SELECTs on the Slave? These could be blocking replication. Glance at SHOW PROCESSLIST a few times -- you may get a feel that there is blocking going on. If you have huge Data Warehouse selects building "reports", let's talk about Summary Tables.

Can you think of other aspects of your situation that might lead to more hints from me?

Edit(s)

It's a rather big change, but you don't need both of FileId and FilecheckID; FileID (a BIGINT) could be the PRIMARY KEY instead of FilecheckID. Sure, that would be 8 bytes instead of 4, but it would shrink the size of most of the secondary keys. (FileID,LastTimeChecked) becomes unnecessary because of FileID being the PK. Keep in mind that (in InnoDB) the PRIMARY KEY is implicitly tacked on to every secondary key. Net change: 2 fewer keys (faster Slave); get rid of a UNIQUE key (which must be checked on every INSERT); shrink 2 secondary keys; expand one secondary key.

How many FileTypes are there? Surely not more than 4 billion? FileTypeID bigint(20) -- Perhaps TINYINT UNSIGNED (1 byte instead of 8) would suffice? (256 possible values.) Yes, other tables that need to JOIN to this one would need a BIGINT instead of an INT, but the change might still be worth it.

Caution: Any schema change on such a big table would take a long time.

innodb_buffer_pool_size = 21G (on 32GB machines) is reasonable. innodb_buffer_pool_instances could be changed to 16, for a minor speed up. What's the disk subsystem like? I'm wondering if innodb_io_capacity can be increased. Other settings seem ok.

Sorry, nothing (yet) jumps out and says "Aha!". I can't tell whether the schema changes would be enough to solve the 'problem'.