Mysql – Why is restoration from binary logs so slow

binlogMySQLmysql-5.5mysqlbinlog

We have a MySQL 5.5 server with binary logging enabled and a backup script that collects the binary logs and, once a day, generates a complete database dump with mysqldump. In testing database restoration from the dump & binlogs, we found that the latter are an order of magnitude slower.

Specifically:

Restoring from a 92.8 MB database dump with mysql dbname < dbdump.sql took 3m 26.526s.
Restoring all of the binary logs from after the dump with mysqlbinlog -d dbname --start-position=107 mysql-bin.000{727..755} | mysql dbname took 38m 10.090s. The total size of the binlog files was 56.8 MB, and the size of the output from mysqlbinlog was 66.4 MB.
Combining the above two steps by just restoring from a (93.2 MB) dump taken after the events the binlogs cover took 2m 15.406s.

Is this time disparity normal? Can anything be done to reduce it?

Best Answer

You must understand what the binary logs are: they store all the queries (in STATEMENT format) or row changes (in ROW format) that happened since they were recorded. So, your command:

mysqlbinlog -d dbname --start-position=107 mysql-bin.000{727..755} | mysql dbname

basically executes all insert, updates & deletes that occurred to the server during those almost 30 binlogs, serially. That is why recovering from the binary logs is only useful for point in time recovery (recovering the latest changes since the last backup).

There are some things that you could do to reduce the application time, like reducing temporarily the durability and consistency constraints of the database (as you can always replay the steps) -innodb_flush_log_at_trx_commit, disable the binary log, the double write, etc.- or try using ROW binary logs only, which may increase the size on disk but make them faster to apply.

Alternatively, you could reduce the number of binlogs that you have to apply by creating more frequent full backups (the size of your full backups is not precisely big), creating real incremental or differential backups with tools like xtrabackup- or using lagged slaves for disaster recovery. This is because both mysqldump and mysqlbinlog recover data in a logical way- physical backups may be faster in some cases, specially for full recovery.

Obviouslly, there could be specific issues as well, for example, I have sometimes found that replaying LOAD DATA statements is slower than I would have thought.

Related Solutions

Mysql – Error reading MySQL 5.5 binary log using thesqlbinlog

Since the binary log on the Master is corrupt, there is nothing else you can do. My condolences. Just skip to the next binlog:

STOP SLAVE;
CHANGE MASTER TO master_log_file='ibm-pr-slglcd-01.000076',master_log_pos=107;
START SLAVE;

Of course, 107 is the start position for MySQL 5.5 binlogs. If the Master is 5.1 (or 5.0.95), use 106. If the Master is prior to 5.1, use 98.

If Master BinLog Corruption is frequent, you may need to consider using a smaller size for the binary logs on the Master (perhaps 128M instead of the default 1G):

[mysqld]
max_binlog_size=128M

This will not stop corruption, but lowering the size it will minimize data loss from 1G to 128M.

MySQL : Extract SQL statements from binlog

--base64-output=DECODE-ROWS --verbose

This is the format you need to be using if you are reviewing the binlog manually... it extracts the maximum amount of useful data from the log... but it will help if you understand how binary logging works in MySQL, first.

RDS (at least for MySQL 5.5 and 5.6) uses binlog_format = MIXED. This means the server may choose, on a query-by-query basis, to log either the actual SQL statement that was executed ("statement" log), or the actual rows changed by the query ("row" log) but not both.

The "commented SQL" that you are seeing is, as explained in the documentation, a human-readable pseudo-sql reconstruction of the actual data that was changed on the master by your query, for events logged using the ROW format. The body of the query that made the change is not preserved, because it is not needed for replication -- slave servers can replicate the exact same changes to their tables as occurred on the master, based on these before ("where") and after ("set") row images.

You can't "get" the original SQL from these events because it's not being saved to the binary log when events are written like this. In the example you posted, it's telling you that the value of the 3rd column changed.

###   @3=1373178370 # old value

###   @3=1378479295 # new value

This "pseudo-SQL" is commented out because it isn't actual valid SQL that a server could execute. It's eyeball-compatible-only. (For a server to replay these row events, it needs to see the BINLOG blocks, which are a base64-encoded version of the information that mysqlbinlog is decoding for you, here).

The column names are also not preserved in the binary log for row-based events, because since they should be the same on master and slave, that information is not needed.

(Edit: you mentioned RDS but re-reading your question, it seems that you may be trying to apply these logs to RDS rather than having retrieved them from RDS; retrieving logs from RDS is only possible with RDS/MySQL 5.6. The conclusion, then, is that the server that originated these logs is either logging in MIXED or ROW mode.)

Update: I see what you are doing now. To summarize the link you posted, you're trying to (1) take a point-in-time snapshot of an existing server with mysqldump, (2) restore that snapshot to an RDS instance, then (3) use mysqlbinlog to play the transactions forward from the binlog to sync the RDS server to your existing production server, and finally (4) migrate your application to RDS.

Bottom line: You are overthinking the nature of the problem and are concerned without cause. MySQL Server knows how to interpret the output of mysqlbinlog without either the --base64-output or --verbose flags and apply the changes it sees, whether or not the original query SQL is logged because when the query isn't logged, the actual row changes are logged.

When you run mysqlbinlog you see what looks almost like noise. Example:

#080828 15:03:08 server id 1  end_log_pos 442   Delete_rows: table id 17 flags: STMT_END_F
...
BINLOG '
fAS3SBMBAAAALAAAANoAAAAAABEAAAAAAAAABHRlc3QAAXQAAwMPCgIUAAQ=
fAS3SBcBAAAAKAAAAAIBAAAQABEAAAAAAAEAA//8AQAAAAVhcHBsZQ==
'/*!*/;

This is a base-64-encoded version of very tightly-packed binary "row images" containing the actual change that needs to be made to the database. A MySQL server knows how to decode and apply this change. Conversely, when you decode-rows and verbose, you're asking mysqlbinlog to decode the binary log for human eyeballs.

Leave your server's binlog_format setting as it is. The binary logs should work without a problem.

But now, forget everything you were planning because literally in the last 24 hours, Amazon has announced a much better way: they have opened up the ability to temporarily connect an RDS instance, as a slave, to a non-RDS master, to allow the RDS instance to be brought into real-time automatically prior to being promoted to master.

This is exactly the solution to what you are trying to accomplish and is much easier and less error prone.

To import data from a MySQL database that is not in Amazon RDS, you can configure replication from that database to an instance of MySQL running in Amazon RDS. The MySQL database you are migrating from can be running either on-premise in your data center, or on an Amazon EC2 instance. The MySQL instance in Amazon RDS must be running either version 5.5.33 or 5.6.13. Using replication to migrate the data reduces downtime.

^{— http://docs.aws.amazon.com/AmazonRDS/latest/UserGuide/MySQL.Procedural.Importing.NonRDSRepl.html}

No kidding.

As a DBA, I've been very reluctant to use RDS in my own production networks, preferring to retain full control over all aspects of the server... and by "reluctant," I mean "we didn't." I finally relented when Amazon made it possible in RDS for MySQL 5.6 to use the RDS instance as master to a non-RDS slave ... and, now, using an RDS master as slave to a legacy server to simplify migration seems like another positive sign for the viability of RDS.

Best Answer

Related Solutions

Mysql – Error reading MySQL 5.5 binary log using thesqlbinlog

MySQL : Extract SQL statements from binlog

Related Question