I've set up MySQL replication, one master and one slave. Now I just starting to check the data for integrity on both sides by using Percona's pt-table-checksum
. Wonderful tool to detect any changes or drift on the slave.
From what I understand from its documentation, it runs on the master, writes checksums to the percona.checksums
table and runs similarly on the slaves connected (automatically detected) using a regular remote MySQL connection. However, the percona.checksums
table is actually also replicated! If data differs on the slave, the checksums from the master may be overwritten on the slave and not being detected.
Now, I am worried this might be causing my observed false negatives in detecting changes.
Does pt-table-checksum
have safeguards put into place for preventing replication on the percona.checksums
table?
- If so, how does this work? (as far as I can see, one cannot change replication filters like this) This may help me to pinpoint some other reason for my observations (see below).
- If not, then how can I be sure the contents of that table reflect checksums of that specific server? Should I put a replication filter in place? This is actually discouraged by Percona's toolkit documentation.
Actual observations
I have seen one occurrence in my test setup in which somehow running pt-table-checksum
did not detect the change deliberately made on the slave. Inspecting the checksum table contents showed the master's checksums. Running the tool on the slave itself then showed the checksums should have been different! This makes me believe there's a race condition going on and the reason why this is hard to reproduce.
Another hypothesis
Suppose the slave has drifted and the network connection is down for a short while the moment running pt-table-checksum
. It will show it is unable to connect to the slave in the stderr
output, but not fail. Short time later, networking is restored, and replication resumes, copying over the master's contents of the table percona.checksums
. Then running pt-table-checksum --replicate-check-only
, will provide false negatives. Am I right here? If so, how can I prevent this in this case?
This is a regular MySQL server/client using Percona's toolkit with it.
Versions:
- pt-table-checksum 2.1.4
- MySQL client: 5.5.24
- MySQL server: 5.5.24-0ubuntu0.12.04.1-log (Ubuntu 12.04)
Best Answer
You're incorrect. It only runs the checksums on the master, not on the slaves.
Example query run:
This works because it requires statement based replication, so the query is run fresh on every slave against their data, not the data the master found. This is documented on pt-table-checksums docs under LIMITATIONS
If you modify your data after the checksum query has replicated, of course it won't show that the data was changed, as it was correct when the checksum was calculated.
If you re-run the checksum after changing the data on the slave, then it should pick it up, assuming you're not using a n>2 tier replication setup.
re: Another hypothesis, it doesn't matter, the connection to the slave only matters for 'live' detection of mismatches, not for the checksums to actually calculate correctly. You shouldn't have false negatives, even with network interruptions.
Makes sense?