MySQL Master-Slave Replication – Ensuring Data Consistency

MySQLreplication

I have setup a master-slave replication with mysql 5.7 and GTID.

GTID should be more "safe" than the older way where each statement alone was replicated right?

But when replicating with GTID can I then check the data consistency on the slave just by checking SHOW SLAVE STATUS\G ?

Or do I need to checksum each row in each table to make sure everything is replicated correctly?

Best Answer

Before I address your question, some terminology issues...

GTID uniquely identifies each thing replicated. This is an improvement on the previous Replication technology -- but mostly when you have multiple machines and need to recover from some kind of failure. It helps automate things.

bin_log_format = STATEMENT or ROW or MIXED refers to what is transferred from Master to Slave. STATEMENT (SBR) is the old way; ROW (RBR) is the new way. There are many subtle reasons why RBR is 'better'.

SHOW SLAVE STATUS provides, among other things, whether or not replication has stopped due to some error. It reports a crude measure of staleness (Seconds_Behind_Master), but not "consistency" (whether the changes arrived and were correctly applied).

You do not need a checksum on each row. It would be clumsy for you to add such. The network layer (TCP/IP, etc) does checksumming to 'guarantee' delivery and to validate the contents of the delivery.

If you want to do your own consistency checking, I recommend Percona's pt-table-checksum . It takes care of a lot of end-cases that are not obvious to someone starting out to do the checking himself.

Related Solutions

Mysql – How to resolve the master server shut down/unavailability in thesql with master – slave replication

Before you perform any mysqldump to fully restore a Slave, you should consult the output of SHOW SLAVE STATUS\G. Let's start with a sample SHOW SLAVE STATUS\G:

mysql> show slave status\G
*************************** 1. row ***************************
             Slave_IO_State: Waiting for master to send event
                Master_Host: 10.48.20.253
                Master_User: replicant
                Master_Port: 3306
              Connect_Retry: 60
            Master_Log_File: mysql-bin.000254
        Read_Master_Log_Pos: 858190247
             Relay_Log_File: relay-bin.066069
              Relay_Log_Pos: 873918
      Relay_Master_Log_File: mysql-bin.000254
           Slave_IO_Running: Yes
          Slave_SQL_Running: Yes
            Replicate_Do_DB:
        Replicate_Ignore_DB:
         Replicate_Do_Table:
     Replicate_Ignore_Table:
    Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
                 Last_Errno: 0
                 Last_Error:
               Skip_Counter: 0
        Exec_Master_Log_Pos: 858190247
            Relay_Log_Space: 873772
            Until_Condition: None
             Until_Log_File:
              Until_Log_Pos: 0
         Master_SSL_Allowed: No
         Master_SSL_CA_File:
         Master_SSL_CA_Path:
            Master_SSL_Cert:
          Master_SSL_Cipher:
             Master_SSL_Key:
      Seconds_Behind_Master: 0
1 row in set (0.00 sec)

Please notice that there are two sets of replication coordinates from the Master

(Master_Log_File,Read_Master_Log_Pos)
(Relay_Master_Log_File,Exec_Master_Log_Pos)

There is a major difference between them

(Master_Log_File,Read_Master_Log_Pos) tells you the last binlog statement from the Master's log file and log position that the Slave read from the Master and placed in its Relay Logs.
(Relay_Master_Log_File,Exec_Master_Log_Pos) tells you the last binlog statement from the Master's log file and log position that the Slave read from the Master and placed in its Relay Logs THAT IS NEXT TO BE EXECUTED ON THE SLAVE.

The timestamps from these two coordinates helps you figure out Seconds_Behind_Master.

Knowing these things, here is what you can do:

Step 01) Run SHOW SLAVE STATUS\G
Step 02) Get Relay_Master_Log_File,Exec_Master_Log_Pos from SHOW SLAVE STATUS\G (In the sample, that would be (mysql-bin.000254,858190247)
Step 03) STOP SLAVE;
Step 04) CHANGE MASTER TO master_log_file='mysql-bin.000254',master_log_pos=858190247;
Step 05) START SLAVE;
Step 06) Wait 10 seconds
Step 07) Run SHOW SLAVE STATUS\G and check Seconds_Behind_Master

If the Seconds_Behind_Master is a number and eventaully drops to zero, replication is fully reesatablished.

After doing all this, if replication breaks because of a corrupt binary log from the master, then you do the last resort:

Steo 01) On the Master, RESET MASTER; to erase all binary logs and start with a new one
Step 02) On the Master, run this

This create proper dump for the slave

echo "STOP SLAVE;" > /root/MySQLData.sql
mysqldump --all-databases --routines --triggers --flush-privileges --master-data=1 >> /root/MySQLData.sql
echo "START SLAVE;" >> /root/MySQLData.sql

Step 03) scp /root/MySQLData.sql over to the Slave and Load it in MySQL on the Slave

Give it a Try !!!

Mysql – Programatically testing slave status – What fields in SHOW SLAVE STATUS indicated slave status is OK / in ERROR

SCENARIO #1

You may want to monitor Relay_Log_Space. It represents the some total of relay logs the Slave has to process. When should you worry about it?

Give this scenario

Slave_IO_Running is Yes
Slave_SQL_Running is No

This indicates that that the IO Thread is still downloading entries from its Master's binary logs. The longer you take to address the SQL error, the more the Relay_Log_Space will pile up. If the Master is in a heavy-write environment, relay logs will pile up quickly and puts the Slave at risk for disk space issues.

If you do not want to monitor Relay_Log_Space, you can always set the relay_log_space_limit. This force a cap on the amount of disk space allow for relay logs. For example, if you set this:

[mysqld]
relay_log_space_limit=8G

and restart mysql, this causes the IO Thread to stop collecting new entries from the Master when the sum of all relay log file sizes >= 8GB.. After the SQL Thread is done processing the oldest relay log, that relay log gets rotated out and the IO Thread starts getting entries from the Master where it left off. This provides a decent throttling mechanism for relay logs.

Another field you may want to look at is Last_SQL_Errno. Once Slave_IO_Running is Yes and Slave_SQL_Running is No, the Last_SQL_Errno will be set to something nonzero. On very rare occasions, it is possible to have this:

Slave_IO_Running is Yes
Slave_SQL_Running is No
Last_SQL_Errno is 0

When you have this unique phenomenon, mysqld has encountered an undocumented error (Don't laugh, I have seen this).

SCENARIO #2

Something else to keep in mind (not necessarily for monitoring)

Give this scenario

Slave_IO_Running is No
Slave_SQL_Running is No

Under what three circumstances would this occur?

Someone ran STOP SLAVE; by hand or in a backup script
The option skip-slave-start is in /etc/my.cnf and mysql was started
Issuing service mysql start --skip-slave-start or service mysql restart --skip-slave-start

Make sure not to monitor false positives should both replication threads are not running.

SCENARIO #3

Give this scenario

Slave_IO_Running is No
Slave_SQL_Running is Yes

This quickly indicates that there is a communication error because of one of the following

The Master went down
mysqld on the Master went down
Network Intermittency

Best Answer

Related Solutions

Mysql – How to resolve the master server shut down/unavailability in thesql with master – slave replication

Mysql – Programatically testing slave status – What fields in SHOW SLAVE STATUS indicated slave status is OK / in ERROR

SCENARIO #1

SCENARIO #2

SCENARIO #3

Related Question