Mysql – Finding differences between a master and slave when pt-table-sync fails

MySQLmysql-5.5percona-toolsreplication

I have a table that's showing some differences on a master and slave according to pt-table-checksum (PTC). I've found some verifying differences beyond that.

Certain tables in the mix have composite primary keys which, while work well enough for PTC do not so much with pt-table-sync (PTS) to try and find the differences. PTS does not seem to honor the newer options of PTC to restrict the depth of composite key searches. End result is the a 5.5Mrow table is left spinning for hours w/ PTS. I'm somewhat convinced there's an infinite loop bug in PTS as the output has just started spitting out the a correction for the same exact record (and nothing else at this point) over and over.

So, really this post is more about alternatives than trying to get this tool to run right.

The most straight forward alternative that comes to mind is flush table with read lock on the master; select * into outfile on master and slave; unlock and diff. However this table is fairly active and I cannot afford it being locked out on the master that long.

I was hoping to do something like start a transaction in repeatable read isolation level and select out from the master there. However I can't find a way to have the slave be stopped specifically at the point in the transaction history as on the master. Things like show master status continue to update after starting a transaction so I can't simply start slave until then.

There's no guarantees about atomicity to "quickly" do "show master status; begin;"

The only other solution of a non blocking method is to write a custom script that might be able to more efficiently do a PTS type thing but with application specific domain knowledge about how the columns of the table in question are generated.

While it might come to that I was hoping to find a more generalized solution for the future when prebuild wheels like PTS turn out not as well rounded as I'd like.

Best Answer

Here's a method that does not interrupt service on the master, though it does lock the slave temporarily.

  1. Stop the slave.

    mysql> STOP SLAVE;
    
  2. Dump the offending table on the master with mysqldump in a repeatable-read transaction. Include the binary log coordinates in the output (--master-data). Output one INSERT per row (--skip-extended-insert).

    $ mysqldump --single-transaction --master-data=2 --skip-extended-insert 
      mydatabase myoffendingtable > master-dump.sql
    
  3. Note the binlog coordinates in the dump file, and start slave until the coordinate shown in the master's dump file.

    mysql> START SLAVE UNTIL MASTER_LOG_FILE='xxxx' MASTER_LOG_POS=yyyy;
    
  4. Wait until the slave thread catches up to that position and stops itself. This shouldn't take long.

    mysql> SELECT MASTER_POS_WAIT('xxxx', yyyy);
    
  5. Dump the table on the slave, also with one row per INSERT.

    $ mysqldump --skip-extended-insert mydatabase myoffendingtable > slave-dump.sql
    
  6. Resume replication.

    mysql> START SLAVE;
    

Now you can diff the dump files, and be assured they represent exactly the same logical point in the stream of changes. This allows you to find data discrepancies.