You're incorrect. It only runs the checksums on the master, not on the slaves.
Example query run:
REPLACE INTO `percona`.`checksums` (db, tbl, chunk, chunk_index, lower_boundary, upper_boundary, this_cnt, this_crc) SELECT 'db', 'table', '1', NULL, NULL, NULL, COUNT(*) AS cnt, COALESCE(LOWER(CONV(BIT_XOR(CAST(CRC32(CONCAT_WS('#', `field1`, `field2`, `field3`, `field4`, `field5`, `field6`, `field7`)) AS UNSIGNED)), 10, 16)), 0) AS crc FROM `db`.`table` /*checksum table*/;
This works because it requires statement based replication, so the query is run fresh on every slave against their data, not the data the master found. This is documented on pt-table-checksums docs under LIMITATIONS
pt-table-checksum requires statement-based replication, and it sets
binlog_format=STATEMENT on the master, but due to a MySQL limitation
replicas do not honor this change. Therefore, checksums will not replicate
past any replicas using row-based replication that are masters for further replicas.
If you modify your data after the checksum query has replicated, of course it won't show that the data was changed, as it was correct when the checksum was calculated.
If you re-run the checksum after changing the data on the slave, then it should pick it up, assuming you're not using a n>2 tier replication setup.
re: Another hypothesis, it doesn't matter, the connection to the slave only matters for 'live' detection of mismatches, not for the checksums to actually calculate correctly. You shouldn't have false negatives, even with network interruptions.
Makes sense?
You say:
databases have been created on the slaves and filled with data.
The easiest way of correcting this is to take a mysqldump
of the specific databases on the slave, drop the database on the slave, and import the dumpfile onto the master. The slaves will then replicate the data.
This of course gets trickier if it is not whole databases or even tables (but specific rows inside a table).
Best Answer
I assume you have a group with transactions A and B and your diverged server has some transactions C that are not in the group, so you cannot join it.
In this case you can setup a group member (if in single primary mode it shall be the primary) as a slave of this diverged member using master-slave replication. This way all the data that was written into the diverged member will be replicated to the group.
I'm assuming 1) all the data on the diverged member has GTIDs that were not reused by the group and 2) your data is safe to replicate to the group, i.e., it is not outdated or has any other data logic issue that only you can know.
When group has this data, you can stop the master slave connection and add the diverged member to the group.