MySQL Replication query – Primary key need

MySQLmysql-5.6perconaprimary-keyreplication

I have few doubts regarding MySQL replication (Master-Slave).
Is it mandatory for a table to have primary query for replication to function properly?

Referring this Percona link for the above question as it mentions that If there is no primary key or unique key defined then it’s even worse because INSERT may be re-executed and you will get multiple rows with the same data – which again means you’ve got inconsistent data with the master

But with InnoDB as the storage engine, if a table does not have primary or a unique query defined, the engine itself creates hidden clustered index on a synthetic column containing row ID values as per Jeremy Cole's blog

So, even in the case primary or unique not present, the replication should not have any impact it in itself creates a clustered index which should ensure the replication is smooth, correct? I'm not sure on this part.

Would be great if someone can throw some light on the need of Primary key in Master-Slave Replication setup.

Best Answer

Your assumption is incorrect.

The hidden clustered index does nothing to help, here, because the server can't use that index to find rows in a way that replication needs.

This causes some problems when the master is logging in STATEMENT mode, different problems in ROW mode, and all of the problems, combined, in MIXED mode. Here's the tl;dr on the biggest pitfall:

Every replication event against a table without a primary or unique key whose value is accessible through the SQL interface has the potential to require a full table scan for each row replicated.

The inaccessibility of the hidden clustered index to the SQL layer excludes it from use... and it wouldn't help, even if it were not as hidden, since its value is not deterministic from server to server, and so its value is never written to the binlog.

Simply enough... you should not create tables without accessible primary keys. It's bad design.

I would argue that a valid table cannot have two identical rows -- every relation has at least one candidate key, by definition, and candidate keys must be unique, also by definition. Choose one and make it primary, or create a surrogate (auto-increment) so that it's visible. If you have unique constraints, you need to define them in the schemata, not just impose them in code.