Mysql – Is it safe to use pt-online-schema-change in a multimaster environment

MySQLperconapercona-toolkitpercona-tools

I have 2 MySQL servers with row-based replication between them. Both of them are masters and slaves for each other (active-active master-master setup).

If I understand it correctly pt-osc creates triggers to catch any changes while running. But from what I know triggers are not fired in a row-based replication environment. So I guess pt-osc is not able to catch changes made on the second master during the change, is it?

EDIT: While doing some tests and I saw that pt-osc was creating triggers on both masters which would cover changes from both sides. Still I'm quite unsure if I can safely do online changes in this environment.

Best Answer

The logic is more complex. (Caveat: This is not a definitive Answer to the Question, just more insight.)

Generally (probably in this case), the triggers should be only on the one machine.
RBR replicates the effect of inserts/etc, not the actual statements. That is what is needed on the Slave (aka other Master).
Are you 'writing' to both Masters? If so, that is a recipe for extra problems; I recommend you stop writing to the other Master while running pt-osc.

Also look into gh-ost, which uses the binlog in tricky ways. (Obviously, you have the binlog turned on.)

Related Solutions

Mysql – 3 Databases, Best Replication Setup

I wrote up an interesting layout last year which features DRBD pairs in two data centers (DC1,DC2) with as follows

DRBD Pair in DC1 (db1 and db2)
DBVIP for Primary of DRBD Pair1 is 10.1.2.30
DRBD Pair in DC2 (db3 and db4)
DBVIP for Primary of DRBD Pair2 is 10.1.2.40
Have MySQL Circular Replication Between DRBD Primaries
Have the 10.1.2.40 as Master_Host for DBRD Pair 1
Have the 10.1.2.30 as Master_Host for DBRD Pair 2

MySQL high availability, failover and replication with Latency

MySQL Replication : 1 Slave / Multiple Masters

Here is why I suggested this: Using two data centers, you setup automatic failover for DRBD Pair in one data center. Let the other DRBD Pair in the second data center be for DB disaser site with it own local redundancy and failover. Should you ever loses one data center, the other data center is fully read with it own local failover setup. Your app would just have to use the DBVIP of the other database center in such a catastrophic case.

Please keep in mind that using DRBD in conjunction with MySQL is only beneficial if all of your data uses the InnoDB Storage Engine. Hard failovers in DRBD could easily result in crashed MyISAM tables.

Here is another setup to consider:

As with DRBD setups, any DRBD Secondary would provide just a Disk-Level copy of your MySQL Folder. It is available as a warm standby. MySQL is not being run on the DRBD Secondary. If you want the third DB server to become hot standby, ditch DRBD altogether and use pure MySQL Replication. With three DB servers, using db3 at a remote site, simply setup the following:

+--> db1 --> db2 --> db3 -->+
^                           |
|                           V
+<--------------------------+

Using your rudimentary failover, now you have two hot standby servers. You just have to make sure each DB server has a unique server_id value. I also recommend using MySQL 5.5 because it uses SemiSync Replication which is more sensitive to communication failures and stop Replication better. You will have to setup the appropriate heartbeats and timeouts.

Mysql – pt-table-checksum help required

You say you are using pt-table-checksum 2.0.1. I would recommend updating to 2.1, as there are many improvements in the tool.

Next, let me address your test. You say the slave was not updated after the first or second commands that you ran. The second command looks to be trying to connect directly to the slave. pt-table-checksum won't report any differences unless the server you're connecting to has slaves.

Also, the --replicate-check-only option will not do any checksumming. (from the docs):

If specified, pt-table-checksum doesn’t checksum any tables. It checks replicas for differences found by previous checksumming, and then exits.

Your first command doesn't seem to be able to connect to the slave host, which is why it doesn't report any differences. Make sure the user/pass that is connecting to the master can also connect to the slave.

Now, as for your complex setup, you are right to worry about breaking replication. With some slaves replicating only certain tables, you should heed the warning here:

If the replicas are configured with any filtering options, you should be careful not to checksum any databases or tables that exist on the master and not the replicas.

You can specify which databases you want to checksum with the --databases option, and give a specific list of tables with the --tables option. Alternatively you can use the --ignore-databases and --ignore-tables options to provide a list of databases/tables to not checksum.

This will probably mean you will want separate pt-table-checksum commands based on which slaves you are trying to checksum. You will probably have to use the 'dsn' --recursion-method to accomplish this (I've never done it, personally)

As for load, pt-table-checksum comes with some options to throttle itself. Namely --max-load and --max-lag.

The tool keeps track of how quickly the server is able to execute the queries, and adjusts the chunks as it learns more about the server’s performance. It uses an exponentially decaying weighted average to keep the chunk size stable, yet remain responsive if the server’s performance changes during checksumming for any reason. This means that the tool will quickly throttle itself if your server becomes heavily loaded during a traffic spike or a background task, for example.

Best Answer

Related Solutions

Mysql – 3 Databases, Best Replication Setup

Mysql – pt-table-checksum help required

Related Question