Mysql – Restoring a MySQL database to a failed master

multi-masterMySQLreplication

I have a master-master configuration in MySQL with two servers. One server should stay live on the network to serve requests (call it server A) and the other should be taken offline to push new code changes (server B).

My idea originally was that after running STOP SLAVE on both servers, that server B could be shut down, updated, and even have a new database schema put in. After this, I thought that I could simply START SLAVE on server B and have the entire database from server A replicated/mirrored back over to server B. However, this is not the case. Restarting the slave and doing a CHANGE MASTER TO (…) and syncing up the log files does not replicate old changes like I want it to: it only replicates new writes from that point on.

I am looking for a way to bring server B up to speed with the latest database from server A, and then have server B continue to replicate changes in a master-master setup. Then I can continue the sequence of server upgrades by doing the same process but keeping server B online only.

Any solutions which require locking the tables will not work since I need to do this change without any downtime. Is this possible?

Best Answer

Slow ALTER in Master-Master

In Master-Master, each server is acting both as a Master and a Slave to the other server.

STOP SLAVE -- When you later do START SLAVE, the stuff that was performed on the Master will be applied to the Slave. If you did, say, a big ALTER on a big table, that ALTER will run, taking a long time. This is probably the "downtime" you are worried about?

The alternative is to do the lengthy process twice.

Stop client usage of B.
On B, turn off writing to the binlog
On B, do the ALTER(s), etc.
On B, turn on writing to the binlog
Wait for replication to catch up. (It was probably blocked by the ALTER.)
Flip -- that is stop talking to A and start talking to B.
On A, (steps 2,3,4,5) binlog off, ALTER, binlog on, catch up

Alternatives

In Galera, this is "RSU" (as opposed to "TOI").

See also pt-online-schema-change -- this tool (percona.com) allows for doing time consuming maintenance on a table with essentially no downtime.

5.7 can now do many ALTERs with ALGORITHM=INPLACE, hence little or no downtime.

Putting the burden on the Code

Keep in mind that some code changes involve schema changes. Sometimes they can be done in a "backward compatible" manner, thereby obviating stuff like this.

In the extreme, you might need to double up the code. That is, have it capable of handing the old schema and the new schema at the same time. This complicates the code, but eliminates coordination problems between code and schema. (The code should eventually be cleaned up to throw out the 'old' variant.)

That last way becomes especially important when you have Sharding, multiple clients, or other cases of "lots of servers to update".

STEP01 : Activate Binary Logging on the Old Server

Step01-a) Add this to /etc/my.cnf on the Master

[mysqld]
server-id=101120
log-bin=mysql-bin

Step01-b) # service mysql restart

After STEP01, you should see mysql-bin.000001 and mysql-bin.index in /var/lib/mysql

STEP02 : Perform mysqldump

On the Master, you can mysqldump the data and record from what point in time it happened.

# service mysql restart --skip-networking --skip-grants
# mysqldump --single-transaction --master-data=2 --all-databases --routines --triggers > MySQLData.sql
# service mysql restart

What this does is record the Master Log File and Position the moment the mysqldump started as a comment. You can visibly see it when you view line 22:

# head -22 MySQLData.sql | tail -1

STEP03 : Load the mysqldump into the Slave

Execute the mysql client loading the mysqldump into the Slave's mysql instance

# mysql -h10.1.1.30 -uroot -p < MySQLData.sql

STEP04 : Create MySQL Replication User on the Master

# mysql -uroot -p -e"GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'replicator'@'10.64.51.%' IDENTIFIED BY 'replpassword'"

STEP05 : Setup the Slave with a Separate Server ID

Add this to /etc/my.cnf on the Slave

[mysqld]
server-id=101130

and # service mysql restart

STEP06 : Setup the Replication on the Slave

Goto the mysql client and run the following command

mysql> CHANGE MASTER TO
    MASTER_HOST='10.1.1.20',
    MASTER_PORT=3306,
    MASTER_USER='replicator',
    MASTER_PASSWORD='replpassword',
    MASTER_LOG_FILE='mysql-bin.000001',
    MASTER_LOG_POS=1;

STEP07 : Setup the point-in-time Master Log and Position

Back in STEP02, I mentioned viewing the point-in-time position using

# head -22 MySQLData.sql | tail -1

You should see something like this:

#CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000015', MASTER_LOG_POS=122957100;

Run it as command in the mysql client on the Slave

mysql> CHANGE MASTER TO MASTER_LOG_FILE='mysql-bin.000015', MASTER_LOG_POS=122957100;

Afterwards, run this command

mysql> SHOW SLAVE STATUS\G

You should see something like this:

*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.64.113.232
                  Master_User: replicant
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.000015
          Read_Master_Log_Pos: 122957100
               Relay_Log_File: relay-bin.003666
                Relay_Log_Pos: 122957100
        Relay_Master_Log_File: mysql-bin.000015
             Slave_IO_Running: No
            Slave_SQL_Running: No
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 106
              Relay_Log_Space: 106
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
1 row in set (0.00 sec)

STEP08 : Launch Replication

Start up replication with this:

mysql> START SLAVE;

Afterwards, run this command again

mysql> SHOW SLAVE STATUS\G

If you see this

             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes

CONGRATULATIONS, MySQL Replication is Working !!!

Thesql replication for certain dababases

I would avoid the replicate-do-db option due to the restriction on cross-database replication and users not fully qualifying a database name or not specifying "use dbname;" (it will execute on master but not replicate to slave).

Use this option instead (put it in my.cnf):

replicate-wild-do-table=dbname1.%

replicate-wild-do-table=dbname2.%

Or you can specify the tables, replace % with table name, one on each line. http://dev.mysql.com/doc/refman/5.0/en/replication-options-slave.html#option_mysqld_replicate-wild-do-table

This link should help get you started: http://dev.mysql.com/doc/refman/5.0/en/replication-solutions-partitioning.html