Mysql – filter the percona.checksums table in replication for use with pt-table-checksum

MySQLpercona-toolsreplication

I've set up MySQL replication, one master and one slave. Now I just starting to check the data for integrity on both sides by using Percona's pt-table-checksum. Wonderful tool to detect any changes or drift on the slave.

From what I understand from its documentation, it runs on the master, writes checksums to the percona.checksums table and runs similarly on the slaves connected (automatically detected) using a regular remote MySQL connection. However, the percona.checksums table is actually also replicated! If data differs on the slave, the checksums from the master may be overwritten on the slave and not being detected.

Now, I am worried this might be causing my observed false negatives in detecting changes.

Does pt-table-checksum have safeguards put into place for preventing replication on the percona.checksums table?

  • If so, how does this work? (as far as I can see, one cannot change replication filters like this) This may help me to pinpoint some other reason for my observations (see below).
  • If not, then how can I be sure the contents of that table reflect checksums of that specific server? Should I put a replication filter in place? This is actually discouraged by Percona's toolkit documentation.

Actual observations

I have seen one occurrence in my test setup in which somehow running pt-table-checksum did not detect the change deliberately made on the slave. Inspecting the checksum table contents showed the master's checksums. Running the tool on the slave itself then showed the checksums should have been different! This makes me believe there's a race condition going on and the reason why this is hard to reproduce.

Another hypothesis

Suppose the slave has drifted and the network connection is down for a short while the moment running pt-table-checksum. It will show it is unable to connect to the slave in the stderr output, but not fail. Short time later, networking is restored, and replication resumes, copying over the master's contents of the table percona.checksums. Then running pt-table-checksum --replicate-check-only, will provide false negatives. Am I right here? If so, how can I prevent this in this case?

This is a regular MySQL server/client using Percona's toolkit with it.

Versions:

  • pt-table-checksum 2.1.4
  • MySQL client: 5.5.24
  • MySQL server: 5.5.24-0ubuntu0.12.04.1-log (Ubuntu 12.04)

Best Answer

You're incorrect. It only runs the checksums on the master, not on the slaves.

Example query run:

REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'db', 'table', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `field1`, `field2`, `field3`, `field4`, `field5`, `field6`, `field7`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db`.`table` /*checksum table*/;

This works because it requires statement based replication, so the query is run fresh on every slave against their data, not the data the master found. This is documented on pt-table-checksums docs under LIMITATIONS

pt-table-checksum requires statement-based replication, and it sets
binlog_format=STATEMENT on the master, but due to a MySQL limitation
replicas do not honor this change. Therefore, checksums will not replicate
past any replicas using row-based replication that are masters for further replicas.

If you modify your data after the checksum query has replicated, of course it won't show that the data was changed, as it was correct when the checksum was calculated.

If you re-run the checksum after changing the data on the slave, then it should pick it up, assuming you're not using a n>2 tier replication setup.

re: Another hypothesis, it doesn't matter, the connection to the slave only matters for 'live' detection of mismatches, not for the checksums to actually calculate correctly. You shouldn't have false negatives, even with network interruptions.

Makes sense?