Mysql – RDS MySQL read replica larger than master

amazon-rdsMySQLreplicationstorage

We have a Multi AZ AWS MySQL 5.6.40 server with a matching single AZ read replica. Both have 200GB of storage available. However, the read replica is rapidly running out of free storage space, while the master server is not.

This graph shows AWS CloudWatch's comparison between the FreeStorageSpace metrics for the two databases over the last 6 weeks. Today, the master database has 184GB free, while the read replica has only 50GB, and this difference is increasing daily.

Have run through the Amazon RDS troubleshooting guide for MySQL storage, and found:

  • the binlog storage is minimal
  • error logs are minimal (and debug and slow query logs are disabled)
  • the data is not noticeably fragmented.

The master server has binlog_format: MIXED, and the read replica has binlong_format: STATEMENT. All other bin* variables are the same between the two servers.

Queries against the information_schema to compare data_length, index_length and data_free show that the read replica exceeds the master server on almost all metrics, and that there's not much fragmentation on either server.

Storage difference between read replica and master:

  • data_length: 80.4 GB
  • index_length: 37.4 GB
  • data_free: -628 MB (replica has more data free space than master)

Row counts for all tables match on both master and the read replica.

A single database (reports_1_0) represents the majority of the storage differences. Its tables receive the most writes and deletions. This database is also the target of the most direct reads on the read replica server.

SHOW TABLE STATUS; for reports_1_0 shows:

+-----------------------------------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| Name                                    | Engine | Version | Row_format | Rows    | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time | Check_time | Collation       | Checksum | Create_options | Comment |
+-----------------------------------------+--------+---------+------------+---------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+-------------+------------+-----------------+----------+----------------+---------+
| answer_distribution                     | InnoDB |      10 | Compact    |  316274 |           1175 |   371654656 |               0 |    412729344 | 196083712 |       40680176 | 2016-05-07 01:55:31 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| course_activity                         | InnoDB |      10 | Compact    |   19692 |            213 |     4210688 |               0 |      6324224 |   5242880 |         726134 | 2016-05-09 14:52:48 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| course_enrollment_birth_year_daily      | InnoDB |      10 | Compact    | 2527043 |            188 |   477118464 |               0 |    825262080 | 267386880 |      453376362 | 2016-05-06 21:07:28 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| course_enrollment_daily                 | InnoDB |      10 | Compact    |  184120 |             97 |    17874944 |               0 |     59932672 |  22020096 |       34341008 | 2016-05-06 21:03:56 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| course_enrollment_education_level_daily | InnoDB |      10 | Compact    |  771408 |            168 |   130203648 |               0 |    238436352 |  84934656 |      144463241 | 2016-05-06 21:06:18 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| course_enrollment_gender_daily          | InnoDB |      10 | Compact    |  541000 |            155 |    84017152 |               0 |    151584768 |  54525952 |       96118437 | 2016-05-06 21:08:42 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| course_enrollment_location_current      | InnoDB |      10 | Compact    |    1965 |            100 |      196608 |               0 |       360448 |   4194304 |         345316 | 2016-05-07 15:33:58 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| course_enrollment_mode_daily            | InnoDB |      10 | Compact    |  211359 |            164 |    34684928 |               0 |     66322432 |  26214400 |       39079118 | 2016-05-06 21:05:00 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| last_country_of_user                    | InnoDB |      10 | Compact    |   73504 |             71 |     5259264 |               0 |            0 |   8388608 |       10287572 | 2016-05-07 15:29:51 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| table_updates                           | InnoDB |      10 | Compact    |      11 |           1489 |       16384 |               0 |        32768 |         0 |           7947 | 2016-05-06 21:24:27 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| video                                   | InnoDB |      10 | Compact    |    8613 |            488 |     4210688 |               0 |      3211264 |   4194304 |        1400348 | 2016-05-07 06:47:54 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |
| video_timeline                          | InnoDB |      10 | Compact    |  469912 |            165 |    77709312 |               0 |    139476992 |  66060288 |       71899522 | 2016-05-07 06:50:54 | NULL        | NULL       | utf8_general_ci |     NULL |                |         |

SHOW SLAVE STATUS; on the read replica shows:

+----------------------------------+--------------+--------------+-------------+---------------+----------------------------+---------------------+-----------------+---------------+----------------------------+------------------+-------------------+-----------------+---------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-----------------------------+------------+------------+--------------+---------------------+-----------------+-----------------+----------------+---------------+--------------------+--------------------+--------------------+-----------------+-------------------+----------------+-----------------------+-------------------------------+---------------+---------------+----------------+----------------+-----------------------------+------------------+--------------------------------------+-------------------------+-----------+---------------------+-----------------------------------------------------------------------------+--------------------+-------------+-------------------------+--------------------------+----------------+--------------------+--------------------+-------------------+---------------+
| Slave_IO_State                   | Master_Host  | Master_User  | Master_Port | Connect_Retry | Master_Log_File            | Read_Master_Log_Pos | Relay_Log_File  | Relay_Log_Pos | Relay_Master_Log_File      | Slave_IO_Running | Slave_SQL_Running | Replicate_Do_DB | Replicate_Ignore_DB | Replicate_Do_Table | Replicate_Ignore_Table                                                                                                                                                                | Replicate_Wild_Do_Table | Replicate_Wild_Ignore_Table | Last_Errno | Last_Error | Skip_Counter | Exec_Master_Log_Pos | Relay_Log_Space | Until_Condition | Until_Log_File | Until_Log_Pos | Master_SSL_Allowed | Master_SSL_CA_File | Master_SSL_CA_Path | Master_SSL_Cert | Master_SSL_Cipher | Master_SSL_Key | Seconds_Behind_Master | Master_SSL_Verify_Server_Cert | Last_IO_Errno | Last_IO_Error | Last_SQL_Errno | Last_SQL_Error | Replicate_Ignore_Server_Ids | Master_Server_Id | Master_UUID                          | Master_Info_File        | SQL_Delay | SQL_Remaining_Delay | Slave_SQL_Running_State                                                     | Master_Retry_Count | Master_Bind | Last_IO_Error_Timestamp | Last_SQL_Error_Timestamp | Master_SSL_Crl | Master_SSL_Crlpath | Retrieved_Gtid_Set | Executed_Gtid_Set | Auto_Position |
+----------------------------------+--------------+--------------+-------------+---------------+----------------------------+---------------------+-----------------+---------------+----------------------------+------------------+-------------------+-----------------+---------------------+--------------------+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+-------------------------+-----------------------------+------------+------------+--------------+---------------------+-----------------+-----------------+----------------+---------------+--------------------+--------------------+--------------------+-----------------+-------------------+----------------+-----------------------+-------------------------------+---------------+---------------+----------------+----------------+-----------------------------+------------------+--------------------------------------+-------------------------+-----------+---------------------+-----------------------------------------------------------------------------+--------------------+-------------+-------------------------+--------------------------+----------------+--------------------+--------------------+-------------------+---------------+
| Waiting for master to send event | 172.20.1.250 | rdsrepladmin |        3306 |            60 | mysql-bin-changelog.298735 |               25076 | relaylog.006485 |         25192 | mysql-bin-changelog.298735 | Yes              | Yes               |                 |                     |                    | mysql.plugin,mysql.rds_monitor,mysql.rds_sysinfo,innodb_memcache.cache_policies,mysql.rds_history,innodb_memcache.config_options,mysql.rds_configuration,mysql.rds_replication_status |                         |                             |          0 |            |            0 |               25076 |           25531 | None            |                |             0 | No                 |                    |                    |                 |                   |                |                     0 | No                            |             0 |               |              0 |                |                             |       1615132695 | 7742e728-07f3-11e6-86b5-0a63f0cc4f67 | mysql.slave_master_info |         0 |                NULL | Slave has read all relay log; waiting for the slave I/O thread to update it |              86400 |             |                         |                          |                |                    |                    |                   |             0 |

SHOW BINARY LOGS on the master shows:

+----------------------------+-----------+
| Log_name                   | File_size |
+----------------------------+-----------+
| mysql-bin-changelog.299084 |   3574293 |
| mysql-bin-changelog.299085 |   3435024 |
| mysql-bin-changelog.299086 |   2907531 |
| mysql-bin-changelog.299087 |   2160437 |
+----------------------------+-----------+

(SHOW BINARY LOG; on replica shows ERROR 1381 (HY000): You are not using binary logging of course.)

SHOW VARIABLES LIKE 'expire%'; on both master and replica shows:

+------------------+-------+
| Variable_name    | Value |
+------------------+-------+
| expire_logs_days | 0     |
+------------------+-------+

SHOW ENGINE INNODB STATUS; on the master shows a reasonable History list length given the number of transactions pending:

------------
TRANSACTIONS
------------
Trx id counter 1901470980
Purge done for trx's n:o < 1901470962 undo n:o < 0 state: running but idle
History list length 1258
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 5718644, OS thread handle 0x2ade8594f700, query id 217905009 10.12.0.247 root init
SHOW engine INNODB STATUS
---TRANSACTION 1901470970, not started
...<snip>

However, the read replica's History list length is huge, given that it's mainly just used for offline reporting and not many active sessions:

------------
TRANSACTIONS
------------
Trx id counter 73241661
Purge done for trx's n:o < 36264361 undo n:o < 0 state: running but idle
History list length 17995717
LIST OF TRANSACTIONS FOR EACH SESSION:
---TRANSACTION 0, not started
MySQL thread id 45590, OS thread handle 0x2b4fab36e700, query id 77505863 10.12.0.247 root init
show engine innodb status
---TRANSACTION 73237845, not started
MySQL thread id 43037, OS thread handle 0x2b4fab3f0700, query id 0 Waiting for master to send event
---TRANSACTION 73241659, not started
MySQL thread id 43038, OS thread handle 0x2b4fab3af700, query id 77505862 Slave has read all relay log; waiting for the slave I/O thread to update it
---TRANSACTION 73241646, not started
MySQL thread id 35815, OS thread handle 0x2b4fab4b3700, query id 77505844 localhost 127.0.0.1 rdsadmin
--------

How do I account for, and repair, this storage difference?

Best Answer

My team located the issue -- there was a massive unfinished query started on the reports_1_0 database around 100 days ago.

Killing this query, and running OPTIMIZE TABLE on the master database's tables is releasing most of the extra data and index storage used by the read replica.