Mysql – How to you monitor if MySQL binlog files get corrupted

binlogmonitoringMySQL

I was wondering if there was any nice way to continuously monitor mysql master binlog files and detect if get corrupted.

Best Answer

There are four(4) things to look for to see if a binlog is corrupt on not:

  1. Binlog Magic Number
  2. Binlog Size
  3. mysqlbinlog
  4. MySQL Replication

BINLOG MAGIC NUMBER

Back on Dec 26, 2011, I wrote about a based-64 number that sits at the top of any binlog (hex value 0xfe 0x62 0x69 0x6e) regardless of MySQL version and regardless of binlog format (STATEMENT,ROW,MIXED). If this hex value is not in the header of the binlog, it is definitely corrupt.

BINLOG FILESIZE

For any binlog position, there is a a corresponding filesize. In other words, whenever mysqld records a SQL command, it does the following:

  • retrieves the current filesize to the statement size
  • calculates the size of the statement it will record (comments included)
    • Comment contains filesize + statement size
    • This sum is designated the binlog position
  • Statement is appended to the binlog
  • Comments are appended to the binlog

In light of this, you can run SHOW MASTER STATUS; and get something like this:

mysql> show master status;
+------------------+-----------+--------------+------------------+
| File             | Position  | Binlog_Do_DB | Binlog_Ignore_DB |
+------------------+-----------+--------------+------------------+
| mysql-bin.000109 | 569025336 |              |                  |
+------------------+-----------+--------------+------------------+
1 row in set (0.00 sec)

mysql>

If there are no writes going on (low writes,high reads), you could compare the filesize of the binlog with the reported binlog position.

Get both values and compare.

If the binlogs are in /var/lib/mysql

BINLOG_INFO=/tmp/binlog.txt
mysql -u... -p... -ANe"show master status" > ${BINLOG_INFO}
BINLOG_FIL=`cat ${BINLOG_INFO} | awk '{print $1}'`
BINLOG_POS=`cat ${BINLOG_INFO} | awk '{print $2}'`
cd /var/lib/mysql
BINLOG_SIZ=`ls -l ${BINLOG_FIL} | awk '{print $5}'`

If there are no writes going on, BINLOG_SIZ should be equal to BINLOG_POS

If there are writes going on, BINLOG_SIZ should be greater or equal to BINLOG_POS

You should never get BINLOG_SIZ greater than BINLOG_POS. If you do, binlog must be corrupt.

mysqlbinlog

The mysqlbinlog utility will dump a text representation of the binlog. If any of the output is garbled, then it it corrupt. However, don't jump to any conclusions if you see this:

BINLOG '
WFxKTRNJAAAAPwAAAKY/YwAAABsAAAAAAAEACHdlYmVkaTMwAA1QYXJ0bmVyQ29uZmlnAAQICA8P
BC0AYwAA
WFxKTRhJAAAAXAAAAAJAYwAAABsAAAAAAAEABP//8BoSAAAAAAAAggMAAAAAAAAJUEFDS0NPVU5U
Azc3MfAaEgAAAAAAAIIDAAAAAAAACVBBQ0tDT1VOVAM3NzI=
'/*!*/;

That is actually a normal signature at the beginning of a mysqlbinlog output. I wrote about this back on Feb 04, 2011 : MySQL master binlog corruption. This is the only exception. Any garbled output from mysqlbinlog that does not look like this is a definite sign of binlog corruption.

MySQL Replication

This is a double-edged sword because of how replication works. Nevertheless, it can still be helpful in two aspects. Keep in mind that a relay log has the exact same characteristics as a binary log.

On a Slave, the IO Thread would parse the incoming binlog entries from the Master and record them in the Slave's current relay log. It is actually running a process like mysqlbinlog. Thus, the mechanism used by the IO Thread is simple:

  1. Read the Comments above the Statement
  2. Get the Expected Position from the Comments
  3. Read the SQL Statement
    • Subtract Previous Read Position from Last Read Position
    • This Determines How Many Bytes to Read to Get the SQL Statement
  4. Append Comments and SQL Statement into the Current Relay Log adjusted by Relay Log Size

If Step 3 ever fails, this means the IO thread could not parse in the next incoming SQL.

The Slave will reveal this failure by writing in its error log something like this:

[ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236).
[ERROR] Slave I/O: Got fatal error 1236 from master when reading data from binary log: 'Client requested master to start replication from impossible position', Error_code: 1236
111014 20:25:48 [Note] Slave I/O thread exiting, read up to log 'mysql-bin.001067', position 183468345.

The fact that error number 1236 is a registered error condition shows that this impossible position scenario can happen upon failure to successfully read incoming SQL. I wrote about this on Oct 20, 2011 : How to handle the My SQL DB Master to Master replication when there was a disk space outage

The Slave knows what position it needed from the header (Binlog Comments) it read before expecting the incoming SQL. There are two causes for this:

CAUSE #1

If there was any network transmission noise or dropped packets, that slight "burp" interrupts the reading of the SQL statement. Thus, it is possible for the header comments to be written and not the SQL that follows.

CAUSE #2

If the binlog on the Master was corrupt to begin with, the IO thread on the Slave will fail for a more obvious reason: GARBAGE IN, GARBAGE OUT !!!

WORKAROUND

The error message example says Slave I/O thread exiting, read up to log 'mysql-bin.001067', position 183468345.. Here is how you can tell which CAUSE happened. Run the following on the Slave:

STOP SLAVE;
CHANGE MASTER TO
MASTER_LOG_FILE=''mysql-bin.001067'',
MASTER_LOG_POS=183468345;
START SLAVE;
SELECT SLEEP(10);
SHOW SLAVE STATUS\G

When you run SHOW SLAVE STATUS\G, if you see

       Slave_IO_Running: Yes
      Slave_SQL_Running: Yes

replication is fine. It was CAUSE #1

but if you see

       Slave_IO_Running: No
      Slave_SQL_Running: Yes

BANG !!!...The master binlog was corrupt anyway (CAUSE #2)

You would need to mysqldump the Master, load it onto the Slave, and setup replication from scratch.