MongoDB replica sets and data duplication

mongodb

I am new to MongoDB and trying to understand replica sets. Specifically, I am trying to understand how MongoDB would handle the following situation:

  1. A record is written to a table with a unique index on the primary Mongo member (Mongo 1).
  2. Mongo #1 crashes before the new record is propagated to the secondary Mongo member (Mongo 2).
  3. Mongo 2 and the arbiter elect Mongo 2 to be the primary.
  4. The application attempts to create a duplicate record in the table. Because Mongo 2 is not aware of the record written to Mongo 1, the unique constraint is not enforced.
  5. Mongo 1 becomes available again and replication resumes.

What happens next? Does Mongo 1 attempt to replicate the record to Mongo 2? Or Mongo 2 to Mongo 1? Or both simultaneously? How is the conflict resolved?

Best Answer

In this case, the following will happen:

  1. When a member rejoins the replica set, the point in time on which the oplog is identical on the cluster is figured out.
  2. If there are oplog entries beyond said given point in time, those entries are tried to be reverted and saved for human analysis. This is called a rollback in MongoLand.
  3. If the node was down for a longer time than the oplog window, it becomes stale as usual, and a manual resync is required. In this case all the entries of the oplog are rolled back when the node comes up again.
  4. The maximum size for a rollback is 300MB. While this sounds little, it is actually quite some, as those 300MB only need to hold the data in which a primary has to figure out that it is not longer part of a replica set (may be through network partitioning).

Please read more on this at Rollbacks During Replica Set Failover and Resync a Member of a Replica Set in the docs.