I was wondering if there was any nice way to continuously monitor mysql master binlog files and detect if get corrupted.
Mysql – How to you monitor if MySQL binlog files get corrupted
binlogmonitoringMySQL
Related Solutions
Surprisingly, that's not gibberish.
That indeed appears at the top of binlogs whenever you do mysqlbinlog to a binary log generated using MySQL 5.1 and MySQL 5.5. You will not see that gibberish in binary logs for MySQL 5.0 and back.
This is why the start point for replication from an empty binary log is
- 107 for MySQL 5.5
- 106 for MySQL 5.1
- 98 for MySQL 5.0 and back
This is good to remember if you do MySQL Replication where the Master if MySQL 5.1 and the slave is MySQL 5.0. This could present a really big headache.
Replication from Master using 5.0 and Slave using 5.1 works fine, not the other way around.(According to MySQL Documentation, it is generally not supported for 3 reasons: 1) Binary Log Format, 2) Row-based Replication, 3) SQL Incompatibility).
Anyway, do a mysqlbinlog on the offending binary log on the master. If the resulting dump produces gibberish in the middle of the dump (which I have seen a couple of times in my DBA career) you may have to skip to position 98 (MySQL 5.0) or 106 (MySQL 5.1) or 107 (MySQL 5.5) of the master's next binary log and start replicating from there (SOB :( you may need to use MAATKIT tools mk-table-checksum and mk-table-sync to reload master changes not on the slave [if you want to be a hero]; even worse, mysqldump the master and reload the slave and start replication totally over [if you don't want to be a hero])
If the mysqlbinlog of the master is completely readable after the top gibberish you saw, it is possible the master's binary log is fine but the relay log on the slave is corrupt (due to transmission/CRC errors). If that's the case, just reload the relay logs by issuing the CHANGE MASTER TO command as follows:
STOP SLAVE;
CHANGE MASTER TO
MASTER_HOST='< master-host ip or DNS >',
MASTER_PORT=3306,
MASTER_USER='< usernmae >',
MASTER_PASSWORD='< password >',
MASTER_LOG_FILE='< MMMM >',
MASTER_LOG_POS=< PPPP >;
START SLAVE;
Where
- MMMM is the last file used from the Master that was last processed on the Slave
- PPPP is the last position used from the Master that was last processed on the Slave
You can get MMMM and PPPP by doing SHOW SLAVE STATUS\G
and using
- Relay_Master_Log_File for MMMM
- Exec_Master_Log_Pos for PPPP
Try it out and let me know !!!
BTW running CHANGE MASTER TO command erases the slave's current relay logs and starts fresh.
Whenever a Slave loses network connectivity, the IO thread simply dies and tries to reconnect every 60 seconds. You did the right thing in pointing the Slave back but let's clarify what to use to point back with.
For the give SHOW SLAVE STATUS\G
mysql> show slave status\G
*************************** 1. row ***************************
Slave_IO_State: Waiting for master to send event
Master_Host: 10.17.20.102
Master_User: replicant
Master_Port: 3306
Connect_Retry: 60
Master_Log_File: mysql-bin.002814
Read_Master_Log_Pos: 823078734
Relay_Log_File: relay-bin.007364
Relay_Log_Pos: 823078879
Relay_Master_Log_File: mysql-bin.002814
Slave_IO_Running: Yes
Slave_SQL_Running: Yes
Replicate_Do_DB:
Replicate_Ignore_DB:
Replicate_Do_Table:
Replicate_Ignore_Table:
Replicate_Wild_Do_Table:
Replicate_Wild_Ignore_Table:
Last_Errno: 0
Last_Error:
Skip_Counter: 0
Exec_Master_Log_Pos: 823078734
Relay_Log_Space: 823079071
Until_Condition: None
Until_Log_File:
Until_Log_Pos: 0
Master_SSL_Allowed: No
Master_SSL_CA_File:
Master_SSL_CA_Path:
Master_SSL_Cert:
Master_SSL_Cipher:
Master_SSL_Key:
Seconds_Behind_Master: 0
Master_SSL_Verify_Server_Cert: No
Last_IO_Errno: 0
Last_IO_Error:
Last_SQL_Errno: 0
Last_SQL_Error:
1 row in set (0.00 sec)
WHAT YOU SHOULD NOT PICK
Master_Log_File
represents the log file containing the last successfully executed SQL statement on the Master that was recorded in the Slave's Relay Logs.Read_Master_Log_Pos
represents the position withinMaster_Log_File
of the last successfully executed SQL statement on the Master that was recorded in the Slave's Relay Logs.
WHAT YOU SHOULD PICK
Relay_Master_Log_File
represents the log file containing the last successfully executed SQL statement on the Master that was executed on the Slave.Exec_Master_Log_Pos
represents the position withinRelay_Master_Log_File
of the last successfully executed SQL statement on the Master that was executed on the Slave.
In most cases, when caught early enough, Master_Log_File
and Relay_Master_Log_File
are the same. When there is significant replication lag or if the SQL thread breaks and the IO Thread keeps collecting, Master_Log_File
and Relay_Master_Log_File
will be different.
CONCLUSION
ALWAYS PICK Relay_Master_Log_File
and Exec_Master_Log_Pos
as a restart point.
Here are my other posts that reemphasize this paradigm:
Dec 05, 2011
: Does `Seconds_Behind_Master` show exact Slave Lag from Master?Jan 04, 2012
: MySQL Slave replication: Need to determine where replication left off to start slave in correct locationJan 24, 2012
: Mysql master-master replication auto flush old logs (Under Reason #3)Apr 04, 2012
: Seconds Behind Master in Cloud Service
CAVEAT #1
Please do not be concerned with binlog rotation on the Master when it comes to Replication's status. The Slave's IO Thread is very sensitive to the Master's change. Here is how you can verify this:
STEP 01 : On the Slave, run SHOW SLAVE STATUS\G
STEP 02 : On the Master
- For MySQL 5.5, run
FLUSH LOGS;
orFLUSH BINARY LOGS;
- Prior to MySQL 5.5, run
FLUSH LOGS;
STEP 03 : On the Slave, run SHOW SLAVE STATUS\G
You will see the Master_Log_File
and Read_Master_Log_Pos
change immedaitely.
CAVEAT #2
Since Relay_Master_Log_File
has the most recent binary from the Master to be executed on the Slave, that's the log you keep on the Master. From the above SHOW SLAVE STATUS\G
, you would run
PURGE BINARY LOGS TO 'mysql-bin.002814';
Related Question
- Thesql binlog file size stop growing
- MySQL : Extract SQL statements from binlog
- MySQL – Binlog Limitations
- MySQL – How to Close or Flush a Binlog File on a Remote Server
- MySQL Binlog – Add Query Executed User Information
- MySQL Binlog Filename Keeps Changing – Replication Issues
- MySQL – Handling 35GB Binlog Files in OS Cache
- Mysql: slave read binlog from local storage
Best Answer
There are four(4) things to look for to see if a binlog is corrupt on not:
BINLOG MAGIC NUMBER
Back on Dec 26, 2011, I wrote about a based-64 number that sits at the top of any binlog (hex value
0xfe 0x62 0x69 0x6e
) regardless of MySQL version and regardless of binlog format (STATEMENT,ROW,MIXED). If this hex value is not in the header of the binlog, it is definitely corrupt.BINLOG FILESIZE
For any binlog position, there is a a corresponding filesize. In other words, whenever mysqld records a SQL command, it does the following:
In light of this, you can run
SHOW MASTER STATUS;
and get something like this:If there are no writes going on (low writes,high reads), you could compare the filesize of the binlog with the reported binlog position.
Get both values and compare.
If the binlogs are in /var/lib/mysql
If there are no writes going on,
BINLOG_SIZ
should be equal toBINLOG_POS
If there are writes going on,
BINLOG_SIZ
should be greater or equal toBINLOG_POS
You should never get
BINLOG_SIZ
greater thanBINLOG_POS
. If you do, binlog must be corrupt.mysqlbinlog
The mysqlbinlog utility will dump a text representation of the binlog. If any of the output is garbled, then it it corrupt. However, don't jump to any conclusions if you see this:
That is actually a normal signature at the beginning of a mysqlbinlog output. I wrote about this back on Feb 04, 2011 : MySQL master binlog corruption. This is the only exception. Any garbled output from mysqlbinlog that does not look like this is a definite sign of binlog corruption.
MySQL Replication
This is a double-edged sword because of how replication works. Nevertheless, it can still be helpful in two aspects. Keep in mind that a relay log has the exact same characteristics as a binary log.
On a Slave, the IO Thread would parse the incoming binlog entries from the Master and record them in the Slave's current relay log. It is actually running a process like mysqlbinlog. Thus, the mechanism used by the IO Thread is simple:
If Step 3 ever fails, this means the IO thread could not parse in the next incoming SQL.
The Slave will reveal this failure by writing in its error log something like this:
The fact that error number 1236 is a registered error condition shows that this
impossible position
scenario can happen upon failure to successfully read incoming SQL. I wrote about this on Oct 20, 2011 : How to handle the My SQL DB Master to Master replication when there was a disk space outageThe Slave knows what position it needed from the header (Binlog Comments) it read before expecting the incoming SQL. There are two causes for this:
CAUSE #1
If there was any network transmission noise or dropped packets, that slight "burp" interrupts the reading of the SQL statement. Thus, it is possible for the header comments to be written and not the SQL that follows.
CAUSE #2
If the binlog on the Master was corrupt to begin with, the IO thread on the Slave will fail for a more obvious reason: GARBAGE IN, GARBAGE OUT !!!
WORKAROUND
The error message example says
Slave I/O thread exiting, read up to log 'mysql-bin.001067', position 183468345.
. Here is how you can tell which CAUSE happened. Run the following on the Slave:When you run
SHOW SLAVE STATUS\G
, if you seereplication is fine. It was
CAUSE #1
but if you see
BANG !!!...The master binlog was corrupt anyway (
CAUSE #2
)You would need to mysqldump the Master, load it onto the Slave, and setup replication from scratch.