Mysql – How to improve thesql replication performance reading the relay logs in slave

MySQLperformancereplication

I have a mysql 5.5 row-based replication being done. The slave is slowly getting delayed relative to the master. On show slave status I see the distance between Master_Log_File and Relay_Master_Log_File increase, the slave can't keep up. I have 30G of pending logs, in Relay_Log_Space.

I read that binlog_row_image (which is not available in 5.5 but I could upgrade if needed) set to minimal would improve disk, network and memory, none of which seems to be my problem – I need faster processing of the relay logs. From what I read, large amounts of updates are the cause for my problem: I have more than 80 million updates per day. These updates happen mostly in one column of a table with PK (I read that not having a PK would cause3 poor performance, but that's covered).

The column being updated has an index itself. Would dropping this index help improve my performance? (I can dispose of it, if I really need to). Would upgrading mysql to a newer version make it possible to process the logs in parallel instead of sequentially? Is there anything else I could do to improve the processing of the relay log files (other than upgrading my hardware) ?

Update: I read that having heavy queries in the slave DB will slow down the replication, but the slave currently doesn't have any significant queries. It is queried several times throughout the day, but I'm closely monitoring it and no particular query is hogging it, and there isn't a big amount of queries either; actually the DB spends more than 50% of the time just idle.

Update 2: The slave has 32G RAM. See below the innodb variables:

mysql> SHOW VARIABLES LIKE 'innodb%';
+---------------------------------+------------------------+
| Variable_name                   | Value                  |
+---------------------------------+------------------------+
| innodb_adaptive_flushing        | ON                     |
| innodb_adaptive_hash_index      | ON                     |
| innodb_additional_mem_pool_size | 536870912              |
| innodb_autoextend_increment     | 8                      |
| innodb_autoinc_lock_mode        | 1                      |
| innodb_buffer_pool_instances    | 1                      |
| innodb_buffer_pool_size         | 21474836480            |
| innodb_change_buffering         | all                    |
| innodb_checksums                | ON                     |
| innodb_commit_concurrency       | 0                      |
| innodb_concurrency_tickets      | 500                    |
| innodb_data_file_path           | ibdata1:10M:autoextend |
| innodb_data_home_dir            |                        |
| innodb_doublewrite              | ON                     |
| innodb_fast_shutdown            | 1                      |
| innodb_file_format              | Antelope               |
| innodb_file_format_check        | ON                     |
| innodb_file_format_max          | Antelope               |
| innodb_file_per_table           | ON                     |
| innodb_flush_log_at_trx_commit  | 2                      |
| innodb_flush_method             |                        |
| innodb_force_recovery           | 0                      |
| innodb_io_capacity              | 200                    |
| innodb_lock_wait_timeout        | 50                     |
| innodb_locks_unsafe_for_binlog  | OFF                    |
| innodb_log_buffer_size          | 268435456              |
| innodb_log_file_size            | 367001600              |
| innodb_log_files_in_group       | 2                      |
| innodb_log_group_home_dir       | ./                     |
| innodb_max_dirty_pages_pct      | 75                     |
| innodb_max_purge_lag            | 0                      |
| innodb_mirrored_log_groups      | 1                      |
| innodb_old_blocks_pct           | 37                     |
| innodb_old_blocks_time          | 0                      |
| innodb_open_files               | 300                    |
| innodb_purge_batch_size         | 20                     |
| innodb_purge_threads            | 0                      |
| innodb_read_ahead_threshold     | 56                     |
| innodb_read_io_threads          | 4                      |
| innodb_replication_delay        | 0                      |
| innodb_rollback_on_timeout      | OFF                    |
| innodb_spin_wait_delay          | 6                      |
| innodb_stats_on_metadata        | ON                     |
| innodb_stats_sample_pages       | 8                      |
| innodb_strict_mode              | OFF                    |
| innodb_support_xa               | ON                     |
| innodb_sync_spin_loops          | 30                     |
| innodb_table_locks              | ON                     |
| innodb_thread_concurrency       | 0                      |
| innodb_thread_sleep_delay       | 10000                  |
| innodb_use_native_aio           | ON                     |
| innodb_use_sys_malloc           | ON                     |
| innodb_version                  | 1.1.4                  |
| innodb_write_io_threads         | 4                      |
+---------------------------------+------------------------+

See below the create table:

CREATE TABLE `Filecheck` (
  `FilecheckID` int(10) unsigned NOT NULL AUTO_INCREMENT,
  `FileID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `NameID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `LastTimeChecked` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
  `FileTypeID` bigint(20) unsigned NOT NULL DEFAULT '0',
  `ExtraInfo` bigint(20) unsigned NOT NULL DEFAULT '0',
  PRIMARY KEY (`FilecheckID`),
  UNIQUE KEY `FileID` (`FileID`),
  KEY `Filecheck_FKIndex1` (`NameID`),
  KEY `LastTimeChecked` (`LastTimeChecked`,`FileID`),
  KEY `FileID_2` (`FileID`,`LastTimeChecked`),
  KEY `NameID` (`NameID`,`FileID`)
) ENGINE=InnoDB AUTO_INCREMENT=347831298 DEFAULT CHARSET=utf8

The column that is being heavily updated is LastTimeChecked.

Best Answer

Upgrade to 5.6 and use multi-threaded replication. But... This is limited to different threads for different databases.
Use Galera -- it does not have the tight coupling between threads and databases, so it does (in theory) a better job of parallel execution.
Read this for more thoughts.

Also, let's see some of the more naughty queries, plus associated SHOW CREATE TABLE. There may be some simple changes to speed up execution, thereby decrease the delays.

If you are using InnoDB, let's see SHOW VARIABLES LIKE 'innodb%' on the Slave, plus how much RAM does the Slave have?

Are you doing huge SELECTs on the Slave? These could be blocking replication. Glance at SHOW PROCESSLIST a few times -- you may get a feel that there is blocking going on. If you have huge Data Warehouse selects building "reports", let's talk about Summary Tables.

Can you think of other aspects of your situation that might lead to more hints from me?

Edit(s)

It's a rather big change, but you don't need both of FileId and FilecheckID; FileID (a BIGINT) could be the PRIMARY KEY instead of FilecheckID. Sure, that would be 8 bytes instead of 4, but it would shrink the size of most of the secondary keys. (FileID,LastTimeChecked) becomes unnecessary because of FileID being the PK. Keep in mind that (in InnoDB) the PRIMARY KEY is implicitly tacked on to every secondary key. Net change: 2 fewer keys (faster Slave); get rid of a UNIQUE key (which must be checked on every INSERT); shrink 2 secondary keys; expand one secondary key.

How many FileTypes are there? Surely not more than 4 billion? FileTypeID bigint(20) -- Perhaps TINYINT UNSIGNED (1 byte instead of 8) would suffice? (256 possible values.) Yes, other tables that need to JOIN to this one would need a BIGINT instead of an INT, but the change might still be worth it.

Caution: Any schema change on such a big table would take a long time.

innodb_buffer_pool_size = 21G (on 32GB machines) is reasonable. innodb_buffer_pool_instances could be changed to 16, for a minor speed up. What's the disk subsystem like? I'm wondering if innodb_io_capacity can be increased. Other settings seem ok.

Sorry, nothing (yet) jumps out and says "Aha!". I can't tell whether the schema changes would be enough to solve the 'problem'.

SUGGESTION

Of the things you are using, you only need these lines

replicate-do-db=our_database  
log-slave-updates

You really don't need binlog-do-db=our_database once you have log-slave-updates working properly.

MySQL looking up more rows than needed (indexing issue)

Your indexes are fine for the two types of queries you mentioned.

This query will be satisfied by traversing the clustered index on the primary key...

[...] WHERE participant_id = x AND question_id = y AND given_answer_id = z;

...and this one is satisfied by the index on 'question_id':

[...] WHERE question_id = x;

The output of EXPLAIN SELECT is not telling you what you think it is telling you, because the value shown in rows is an estimate of the number of rows the server will need to consider, not the actual rows it will examine. For InnoDB these are based on index statistics.

rows

The rows column indicates the number of rows MySQL believes it must examine to execute the query.

For InnoDB tables, this number is an estimate, and may not always be exact.

^{— http://dev.mysql.com/doc/refman/5.5/en/explain-output.html#explain_rows}

The optimizer gathers information about different possible query plans, and chooses the one with the lowest cost. The information shown in EXPLAIN is the information the optimizer gathered about the plan it selected.

When type is ref and key is not NULL, this means that the name listed in the key column is the name of the index that the optimizer has chosen to use to find the desired rows, so your query plan looks exactly as it should.

Note, sometimes you will see Using index in the Extra column and a lot of people assume that this means an index is being used, or that no index is being used when that doesn't appear, but that's not correct, either. Using index describes a special case called a "covering index" -- it does not indicate whether an index is being used to locate the rows of interest.

It's possible that running ANALYZE [LOCAL] TABLE would cause the numbers in rows shown by EXPLAIN to differ, but this is a simple query and selecting this index is an obvious choice for the optimizer to make, so ANALYZE TABLE is unlikely to make any actual difference in performance.

It is possible, however, that your overall performance might see some marginal improvement with an occasional OPTIMIZE [LOCAL] TABLE, because you are not inserting rows in primary key order (as would be the case with an auto_increment primary key)... but on large tables this can be time-consuming because it rebuilds a new copy of the table... but, again, I wouldn't expect any significant change.

Best Answer

Related Solutions

MySQL Slave Relay Logging but not logging Binary Log

SUGGESTION

MySQL looking up more rows than needed (indexing issue)

Related Question