Mysql – How to you monitor if MySQL binlog files get corrupted

binlogmonitoringMySQL

I was wondering if there was any nice way to continuously monitor mysql master binlog files and detect if get corrupted.

Best Answer

There are four(4) things to look for to see if a binlog is corrupt on not:

Binlog Magic Number
Binlog Size
mysqlbinlog
MySQL Replication

BINLOG MAGIC NUMBER

Back on Dec 26, 2011, I wrote about a based-64 number that sits at the top of any binlog (hex value 0xfe 0x62 0x69 0x6e) regardless of MySQL version and regardless of binlog format (STATEMENT,ROW,MIXED). If this hex value is not in the header of the binlog, it is definitely corrupt.

BINLOG FILESIZE

For any binlog position, there is a a corresponding filesize. In other words, whenever mysqld records a SQL command, it does the following:

retrieves the current filesize to the statement size
calculates the size of the statement it will record (comments included)
- Comment contains filesize + statement size
- This sum is designated the binlog position
Statement is appended to the binlog
Comments are appended to the binlog

In light of this, you can run SHOW MASTER STATUS; and get something like this:

mysql> show master status;
+------------------+-----------+--------------+------------------+
| File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+-----------+--------------+------------------+
| mysql-bin.000109 | 569025336 |              |                  |
+------------------+-----------+--------------+------------------+
1 row in set (0.00 sec)

mysql>

If there are no writes going on (low writes,high reads), you could compare the filesize of the binlog with the reported binlog position.

Get both values and compare.

If the binlogs are in /var/lib/mysql

BINLOG_INFO=/tmp/binlog.txt
mysql -u... -p... -ANe"show master status" > ${BINLOG_INFO}
BINLOG_FIL=`cat ${BINLOG_INFO} | awk '{print $1}'`
BINLOG_POS=`cat ${BINLOG_INFO} | awk '{print $2}'`
cd /var/lib/mysql
BINLOG_SIZ=`ls -l ${BINLOG_FIL} | awk '{print $5}'`

If there are no writes going on, BINLOG_SIZ should be equal to BINLOG_POS

If there are writes going on, BINLOG_SIZ should be greater or equal to BINLOG_POS

You should never get BINLOG_SIZ greater than BINLOG_POS. If you do, binlog must be corrupt.

mysqlbinlog

The mysqlbinlog utility will dump a text representation of the binlog. If any of the output is garbled, then it it corrupt. However, don't jump to any conclusions if you see this:

BINLOG '
WFxKTRNJAAAAPwAAAKY/YwAAABsAAAAAAAEACHdlYmVkaTMwAA1QYXJ0bmVyQ29uZmlnAAQICA8P
BC0AYwAA
WFxKTRhJAAAAXAAAAAJAYwAAABsAAAAAAAEABP//8BoSAAAAAAAAggMAAAAAAAAJUEFDS0NPVU5U
Azc3MfAaEgAAAAAAAIIDAAAAAAAACVBBQ0tDT1VOVAM3NzI=
'/*!*/;

That is actually a normal signature at the beginning of a mysqlbinlog output. I wrote about this back on Feb 04, 2011 : MySQL master binlog corruption. This is the only exception. Any garbled output from mysqlbinlog that does not look like this is a definite sign of binlog corruption.

MySQL Replication

This is a double-edged sword because of how replication works. Nevertheless, it can still be helpful in two aspects. Keep in mind that a relay log has the exact same characteristics as a binary log.

On a Slave, the IO Thread would parse the incoming binlog entries from the Master and record them in the Slave's current relay log. It is actually running a process like mysqlbinlog. Thus, the mechanism used by the IO Thread is simple:

Read the Comments above the Statement
Get the Expected Position from the Comments
Read the SQL Statement
- Subtract Previous Read Position from Last Read Position
- This Determines How Many Bytes to Read to Get the SQL Statement
Append Comments and SQL Statement into the Current Relay Log adjusted by Relay Log Size

If Step 3 ever fails, this means the IO thread could not parse in the next incoming SQL.

The Slave will reveal this failure by writing in its error log something like this:

[ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236).
[ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position', Error_code: 1236
111014 20:25:48 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.001067', position 183468345.

The fact that error number 1236 is a registered error condition shows that this impossible position scenario can happen upon failure to successfully read incoming SQL. I wrote about this on Oct 20, 2011 : How to handle the My SQL DB Master to Master replication when there was a disk space outage

The Slave knows what position it needed from the header (Binlog Comments) it read before expecting the incoming SQL. There are two causes for this:

CAUSE #1

If there was any network transmission noise or dropped packets, that slight "burp" interrupts the reading of the SQL statement. Thus, it is possible for the header comments to be written and not the SQL that follows.

CAUSE #2

If the binlog on the Master was corrupt to begin with, the IO thread on the Slave will fail for a more obvious reason: GARBAGE IN, GARBAGE OUT !!!

WORKAROUND

The error message example says Slave I/O thread exiting, read up to log 'mysql-bin.001067', position 183468345.. Here is how you can tell which CAUSE happened. Run the following on the Slave:

STOP SLAVE;
CHANGE MASTER TO
MASTER_LOG_FILE=''mysql-bin.001067'',
MASTER_LOG_POS=183468345;
START SLAVE;
SELECT SLEEP(10);
SHOW SLAVE STATUS\G

When you run SHOW SLAVE STATUS\G, if you see

       Slave_IO_Running: Yes
      Slave_SQL_Running: Yes

replication is fine. It was CAUSE #1

but if you see

       Slave_IO_Running: No
      Slave_SQL_Running: Yes

BANG !!!...The master binlog was corrupt anyway (CAUSE #2)

You would need to mysqldump the Master, load it onto the Slave, and setup replication from scratch.

Related Solutions

MySQL – Master Binlog Corruption

Surprisingly, that's not gibberish.

That indeed appears at the top of binlogs whenever you do mysqlbinlog to a binary log generated using MySQL 5.1 and MySQL 5.5. You will not see that gibberish in binary logs for MySQL 5.0 and back.

This is why the start point for replication from an empty binary log is

107 for MySQL 5.5
106 for MySQL 5.1
98 for MySQL 5.0 and back

This is good to remember if you do MySQL Replication where the Master if MySQL 5.1 and the slave is MySQL 5.0. This could present a really big headache.

Replication from Master using 5.0 and Slave using 5.1 works fine, not the other way around.(According to MySQL Documentation, it is generally not supported for 3 reasons: 1) Binary Log Format, 2) Row-based Replication, 3) SQL Incompatibility).

Anyway, do a mysqlbinlog on the offending binary log on the master. If the resulting dump produces gibberish in the middle of the dump (which I have seen a couple of times in my DBA career) you may have to skip to position 98 (MySQL 5.0) or 106 (MySQL 5.1) or 107 (MySQL 5.5) of the master's next binary log and start replicating from there (SOB :( you may need to use MAATKIT tools mk-table-checksum and mk-table-sync to reload master changes not on the slave [if you want to be a hero]; even worse, mysqldump the master and reload the slave and start replication totally over [if you don't want to be a hero])

If the mysqlbinlog of the master is completely readable after the top gibberish you saw, it is possible the master's binary log is fine but the relay log on the slave is corrupt (due to transmission/CRC errors). If that's the case, just reload the relay logs by issuing the CHANGE MASTER TO command as follows:

STOP SLAVE;
CHANGE MASTER TO
MASTER_HOST='< master-host ip or DNS >',
MASTER_PORT=3306,
MASTER_USER='< usernmae >',
MASTER_PASSWORD='< password >',
MASTER_LOG_FILE='< MMMM >',
MASTER_LOG_POS=< PPPP >;
START SLAVE;

Where

MMMM is the last file used from the Master that was last processed on the Slave
PPPP is the last position used from the Master that was last processed on the Slave

You can get MMMM and PPPP by doing SHOW SLAVE STATUS\G and using

Relay_Master_Log_File for MMMM
Exec_Master_Log_Pos for PPPP

Try it out and let me know !!!

BTW running CHANGE MASTER TO command erases the slave's current relay logs and starts fresh.

Mysql – Replication master binlog rotation when network is unavailable

Whenever a Slave loses network connectivity, the IO thread simply dies and tries to reconnect every 60 seconds. You did the right thing in pointing the Slave back but let's clarify what to use to point back with.

For the give SHOW SLAVE STATUS\G

mysql> show slave status\G
*************************** 1. row ***************************
               Slave_IO_State: Waiting for master to send event
                  Master_Host: 10.17.20.102
                  Master_User: replicant
                  Master_Port: 3306
                Connect_Retry: 60
              Master_Log_File: mysql-bin.002814
          Read_Master_Log_Pos: 823078734
               Relay_Log_File: relay-bin.007364
                Relay_Log_Pos: 823078879
        Relay_Master_Log_File: mysql-bin.002814
             Slave_IO_Running: Yes
            Slave_SQL_Running: Yes
              Replicate_Do_DB:
          Replicate_Ignore_DB:
           Replicate_Do_Table:
       Replicate_Ignore_Table:
      Replicate_Wild_Do_Table:
  Replicate_Wild_Ignore_Table:
                   Last_Errno: 0
                   Last_Error:
                 Skip_Counter: 0
          Exec_Master_Log_Pos: 823078734
              Relay_Log_Space: 823079071
              Until_Condition: None
               Until_Log_File:
                Until_Log_Pos: 0
           Master_SSL_Allowed: No
           Master_SSL_CA_File:
           Master_SSL_CA_Path:
              Master_SSL_Cert:
            Master_SSL_Cipher:
               Master_SSL_Key:
        Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
                Last_IO_Errno: 0
                Last_IO_Error:
               Last_SQL_Errno: 0
               Last_SQL_Error:
1 row in set (0.00 sec)

WHAT YOU SHOULD NOT PICK

Master_Log_File represents the log file containing the last successfully executed SQL statement on the Master that was recorded in the Slave's Relay Logs.
Read_Master_Log_Pos represents the position within Master_Log_File of the last successfully executed SQL statement on the Master that was recorded in the Slave's Relay Logs.

WHAT YOU SHOULD PICK

Relay_Master_Log_File represents the log file containing the last successfully executed SQL statement on the Master that was executed on the Slave.
Exec_Master_Log_Pos represents the position within Relay_Master_Log_File of the last successfully executed SQL statement on the Master that was executed on the Slave.

In most cases, when caught early enough, Master_Log_File and Relay_Master_Log_File are the same. When there is significant replication lag or if the SQL thread breaks and the IO Thread keeps collecting, Master_Log_File and Relay_Master_Log_File will be different.

CONCLUSION

ALWAYS PICK Relay_Master_Log_File and Exec_Master_Log_Pos as a restart point.

Here are my other posts that reemphasize this paradigm:

Dec 05, 2011 : Does `Seconds_Behind_Master` show exact Slave Lag from Master?
Jan 04, 2012 : MySQL Slave replication: Need to determine where replication left off to start slave in correct location
Jan 24, 2012 : Mysql master-master replication auto flush old logs (Under Reason #3)
Apr 04, 2012 : Seconds Behind Master in Cloud Service

CAVEAT #1

Please do not be concerned with binlog rotation on the Master when it comes to Replication's status. The Slave's IO Thread is very sensitive to the Master's change. Here is how you can verify this:

STEP 01 : On the Slave, run SHOW SLAVE STATUS\G

STEP 02 : On the Master

For MySQL 5.5, run FLUSH LOGS; or FLUSH BINARY LOGS;
Prior to MySQL 5.5, run FLUSH LOGS;

STEP 03 : On the Slave, run SHOW SLAVE STATUS\G

You will see the Master_Log_File and Read_Master_Log_Pos change immedaitely.

CAVEAT #2

Since Relay_Master_Log_File has the most recent binary from the Master to be executed on the Slave, that's the log you keep on the Master. From the above SHOW SLAVE STATUS\G, you would run

PURGE BINARY LOGS TO 'mysql-bin.002814';