Mysql – Screwed up replication by sharing server ids

MySQLreplication

So I had my screwup for the week today. I was adding another pair of slaves and set the same server ID on the new slave as an old slave.

The layout kind of looks like

Master
|     |
\/    |
oldS  |
      \/
      newS1
      |
      |
      \/
      newS2

So to be clear the old slave (oldS) and the first new slave (newS1) share the same server ID.

It's not circular replication so I'm hoping things will turn out okay. I wouldn't have expected the fallout though.

Alarms started going off b/c oldS1 started falling farther and farther behind. Looking at the logdir it was making thousands and thousands of empty relay logs.

I stopped slaving on newS1 and that seemed to clear things up in that oldS1 stopped making empty relay logs and caught back up.

Both slaves seem to be in a consistent state up to the point I stopped slaving on newS1.

  • Will fixing everything be as simple as bouncing newS1 with a new, unique ID be kosher, especially considering newS1 is itself slaving to newS2?
  • Is there anything else to be cautious about?
  • Why did this result in oldS spawning empty relay log after empty relay log? I would have though oldS and newS1 had no knowledge of the others existence.
  • I thought relay log rolling was just determined by the slave itself. Is the master sending some signal that it should spawn a new relay log?

Best Answer

From my standpoint, you may have potentially introduced data drift into replication.

Baron Schwartz presented this as a puzzle in his blog.

You may have to reload oldS and newS1 with fresh data.

At the very least, you should use pt-table-checksum to see if

  • the data on Master differs from oldS
  • the data on Master differs from newS1

If the differences are not identical, you can either

Before you touch anything, please fix the server-id situation

The spawning of relay logs is to be expected because sibling slaves take turns getting SQL entries from the Master. They simply cannot share the same server-id. The Master will somehow alert subsequent slaves that I gave server-id an SQL statement already. Thus, the I/O thread on subsequent slaves will disconnect and retry. Consequently, empty relay logs increase. (Trust me, I have shot myself in the foot with this years ago).

This methodology of the Master talking to Slaves for this info allowed MySQL (eh, Oracle) to come up with semisync replication. This would break semisync replication as well. Even though MySQL 5.6 will soon introduce a Global Transaction ID into the mix, server-id will still be used in its method of checks-and-balances on the Master. After all, if an eagle had two eaglets, no eagle would spit into two mouths at the same time in order to feed them.