MongoDB durability and consistency with j= true and w:majority

mongodbreplication

Assume there is a 7 member replica set, where each node is a voting member and no arbiter node is present. For a write w1, Using j=true and w:majority would ensure that a write is written to the journal and let's assume to the disk also after the default time interval.

If 3 nodes fail (all having the write w1 mentioned above) at the same time including the primary, only one of the remaining 4 nodes has the write w1. If this node does not get enough votes (say it's overloaded and will not be suitable for role of a primary), and some other node gets all enough votes to become primary, we have basically lost the write w1.

I know this is a rare case, but it still can happen. So isn't it true that using j= true and w:majority also will not assure that the writes are durable?

Best Answer

This is not how it works. With j:true and w: majority, w1 would only be acknowledged when the majority of nodes actually had reported to the primary that they have put the operation to the journal. Since the primary went down, this acknowledgement would never happen. So in this edge case, you would only have to do what you'd need to do what you always need to do if a failover happens before a write operation is persisted: redo it.

Let us expand the example a bit: let us assume that the write was successful, and then the nodes which already received the writes minus one go down. Now, an election takes place. The new primary is elected basically by each remaining node announcing the latest timestamp they have in the oplog. So, the node which received w1 would win (since the oplog is processed consecutively for each entry) and the remaining nodes would start to process the oplog from their latest position to the new primaries latest position - w1 will be replicated and hence it will be eventually consistent on all nodes.