Mysql – Both replication masters stop by themselves

MySQLreplication

I did a master/master replication with MySQL on a Gentoo OVH Release 2, all work fine in my test phase. I put it in production, the replication works fine for 1 or 2 days, but this morning, I don't know why, my slave stops running!

So now the log position is bad. I can't just restart slave and my replication doesn't work.

I want a master/master replication to make a backup server with an IP failover, so only one server is writing/reading in the database at same time.

When I go to MySQL and click on Show slave status, I see an error like:

Error 'Duplicate entry '411465' for key 1' on query. Default database …. etc.

Did this error stop my replication? If yes, why do I get this error? The second server does nothing on the database, so normally, there is no problem with the autoincrement, right?

Best Answer

The provided error Error 'Duplicate entry '411465' for key 1' on query means that the slave read and attempted to execute a binary log event to insert a row that already existed, ie. the same value 411465 for your primary key.

The most likely cause of this is the the insert was executed on the slave. To diagnose the query, you would use mysqlbinlog and use the binary log coordinates from SHOW SLAVE STATUS. This will give you the server-id that the query originated from, which will match either your main master or 'passive' master.

Once you determine the query, you can identify the row on the server that is throwing the slave error to determine next steps. You can choose to :

  • skip the entry using SET GLOBAL sql_slave_skip_counter=1 to proceed to the next binary log statement
  • Delete the specific row on the server and start slave to have the statement run from replication.

However, you need to take steps to understand how the mismatch occurred, or you're going to run into this again. This will require some more detective work from your end using mysqlbinlog.

If, as you say, only one master is writeable at a time, you should ensure the following:

  • passive master is read_only=1 and your failover solution is able to modify read_only.
  • the user that your application runs as (or any other non-trusted user) does not have the SUPER privilege. Any user with SUPER privilege can execute statements even on read_only=1 servers.
  • setup pt-table-checksum to ensure the data is in sync on both servers.