MySQL RDS Read Replica – How to Fix Replication Error

MySQL

I use amazon rds and have a cross region replica. While I was doing some schema changes my read replica gave me this error and replication stopped.

Message : Read Replica Replication Error – SQLError: 1146, reason:
Error Table phppoint_koreannoona.phppos_items doesn't exist on query.
Default database: DB_NAME;. Query: ALTER TABLE phppos_items DROP
COLUMN aaatex_qb_item_name

What would cause this? The table is in the master. My migration DID remove that field but entire table was missing in slave.

I had to rebuild slave.

Best Answer

What would cause this?

Let's go into facts, then we will later have to go into speculation:

Table phppoint_koreannoona.phppos_items doesn't exist on query

You executed an ALTER TABLE query on a table that didn't exist. Your table existed on the master at the time the alter was ran (otherwise, it wouldn't have gone through the binlog and tried to be executed on the replica), but didn't exist on the replica. This makes replication break because it cannot continue in a consistent way.

What can make a table exist on the master and not on the replica? Several options:

  • The table was never there. In order to create a replica, RDS does a snapshot of the master and transfer it away. Depending on when or how it is done, the replica can be non-consistent (for example, I would be skeptical of a snapshot done at the same time than a running schema change). Maybe it was a memory only table and it wasn't properly transferred to disk? Maybe it was in a particular engine that has issues with transactionality (MyISAM).
  • The table was there, but it was deleted somehow.
  • Something else happened that caused the replicas to go out of sync. The exact reason will depend highly on the environement, configuration and the application. Some examples include:

    • A crash of the replica in a non-crash safe configuration

    • Replication coordinates were changed inappropriately or other maintenance introduced a mistake on replication

    • Unsafe write statements or configuration is used, which leads to inconsistencies

    • Filtering is used, leading to data inconsistencies

    • Queries were run on the master with replication/binlog disabled

How to fix that?

The safest way to fix inconsistencies on a replica is to recreate it from scratch. You can copy a single table and synchronize it relatively easily if it has a low amount of writes, but that will not prevent you from having other data drift on other tables.

How to prevent that from happening again?

The most important part is to have checks in places to detect issues early. Tools such as pt-table-checksum or simple dumps and compares may allow you to detect differences earlier. It is impossible, with the information given, to say exactly what caused that, as mysql configuration will be safe or not depending on the specific setup. Some general advice:

  • Try using ROW based replication, it is usually more resilient against bad configuration/queries
  • Check for "unsafe statements for binlog" warnings on the logs
  • Check the logs for replication changes due to improper maintenance
  • Make sure replicas are really in read only and they cannot be written or manipulated out of band
  • Use innodb on all your tables, as well as crash-resilient configuration for your replicas