Running SHOW SLAVE STATUS
is not the way to check if you have a sync problem. It only tells you if you ran into the sync problem in a certain way. Your data can be way out of sync with replication still running happily. Checking the checksum is the more proper way indeed, and can be done automatically for you with percona table checksum (http://www.percona.com/doc/percona-toolkit/2.2/pt-table-checksum.html), formerly called maatkit.
With a single write master <> master setup, I don't need to worry about setting the auto_increment_offset correct?
Yes, you are right. I have setup Master-Master setups for years with one Write Master. I never touched the auto_increment options. Never had an incident.
Does MySQL replicate all databases by default, or do I need to explicitly identify them via binlog-do-db?
Yes, it is by default. Although binlog-do-db exists, I would not recommend it since you are doing Master-Master. If you have additional slaves, it is best to setup filtering on the slave. That way, you have all binlog events to point-in-time recovery and other incidentals. Such incidentals might include Star Topologies (which feature Distribution Masters, a Master that does not house any data, only binlogs Such binlogs could be filter with binlog_do_db to reduce binlog traffic to slaves).
Do I need to do an export from master1 and import to master2, before I start replication ?
You do need to to export from master1 and import to master2.
can I start master 2 to pick up all transactions over the last several months?
The only way you can do that is if you had every binlog since you loaded master1 aznd never erased any of the binlogs. Most people rotate their binlogs, making this impossible.
I need to setup a "replicator" user (with replication perms) on both master1 and master2?
Yes
EPILOGUE
I have setup ucarp over the years as the failover mechanism for the floating IP. Most use Linux Heartbeat, Pacemaker, and others. So, you are in the ballpark of what needs to be done.
Best Answer
As long as you do not perform any INSERT/UPDATE/DELETE statements directly on the Slave, your Slave should be just fine. Otherwise, MySQL Replication could break if you INSERT an new row in mydb.mytable on the Slave and, via Replication, the Slave later detects an INSERT of a row to mydb.mytable with the same PRIMARY KEY. This produces error 1062 (Duplicate Key).
The only way you could write to Slave without breaking MySQL Replication is this:
CREATE DATABASE db4;
on the SlaveTo detect a Master and Slave being out of sync without downloading any tools, pick any table and run CHECKSUM TABLE against a table on the Master and the Slave's copy of the same table.
EXAMPLE
If you have a table
mydb.mytable
, run the command against it:If the values do not come back the same, then something is out-of-sync.
If you want to examine a bunch of table in bulk, you can down Percona's MAATKIT. You will need two specific tools (Percona also has the Percona Toolkit that they themselves forked from MAATKIT which is now being promoted more)
or
pt-table-checksum
will perform a CHECKSUM TABLE against all tables on Master and Slave. You can configure it to do all databases on just specific ones.pt-table-sync
can be run on a Slave against any table. Using the --print and --sync-to-master options, you can see what SQL statements need to be executed on the Slave to have it perfectly match the Master. This tool does not work with table that lacks a PRIMARY KEY or UNIQUE KEY.I have used MAATKIT for years. I still do. I have not tried the Percona Toolkit yet, but I am sure it should be of the same quality as MAATKIT.