Mysql – Binary Log Format in MySQL

MySQL

Reference manual for MySQL 5.6 states that " Some changes, however, still use the statement-based format. Examples include all DDL (data definition language) statements such as CREATE TABLE, ALTER TABLE, or DROP TABLE. "

Does this statement means that even if we have ROW format for binary logs all DDLs will be logged in binary log as statement based? How does this affect replication?

Kindly help me to understand this.

Best Answer

Even though binlog_format is a global variable, it doesn't exactly set the "format" of the binary log itself -- it sets the format in which DML events are logged for execution by the slave.

In STATEMENT mode, the actual DML queries that change data on the master are written to the log as they were received by the master, and subsequently executed on the slave.

In ROW mode, DML queries are not logged as SQL; instead, "images" of the actual changed rows are written to the log, which the slave applies directly to its data set. When rows are deleted, the data contained in the deleted row is written to the log; when rows are inserted, the inserted rows are logged, and when rows are updated, the data from the old and new version of the row is logged.

Row-based logging has a number of advantages over statement-based logging, perhaps the most significant being the fact that it's absolutely deterministic, since the actual data from the actual rows is sent from machine to machine. There are interactions with no user-defined variables, no non-deterministic functions (e.g. UUID()), no issues with rows being accessed in any particular order (e.g. DELETE ... LIMIT) ... but since row-based logging only deals with the data in the individual rows, it's only applicable to DML.

DDL is still logged an SQL query -- a statement-based log event -- because that's how DDL is done -- you alter a table with ALTER TABLE... the row based log events are "another way" of changing row data, but there would be no reason for MySQL to implement "another way" of handling ALTER TABLE (or similar events). The row-based logging format is only interested in row images, not table objects.

DDL has none of the same problems with determinism that DML has, which is the primary problem that row-based replication solves -- there's more than one way to DELETE FROM t1 LIMIT 1 (which '1' row would get deleted is generally predictable but technically undefined) but there's already exactly and only one way to ALTER TABLE t1 DROP COLUMN c1.

How does this affect replication? It doesn't have any negative impact on replication, or any particular impact at all, since that's how row-based replication has always worked since it was introduced in MySQL.

It's something to keep in mind if you are filtering replication because of the potential impact on how filtering rules are interpreted... but if you are thinking about filtering replication (not replicating all of your tables), my advice is that it's usually best to just take those thoughts... and let them go. It's hard to think of any really good cases where filtering replication is worth the trouble when the default "replicate everything" is so much more straightforward and less subject to nuance, and if you have data that for some reason shouldn't be on the slave, there's always the BLACKHOLE storage engine.