Mongodb “WriteConcernFailed” on a replica set

mongodb

I have a replica set of 4 members. 1 primary, 1 arbiter, 1 secondary, and another one that should be secondary, but have issues and is not connected. (in a "connecting…" state).

When the unconnected member is part of the replica set, i get errors in the other members:

from the server mongo log:

[LogicalSessionCacheRefresh] Failed to refresh session cache: WriteConcernFailed: waiting for replication timed out; Error details: { wtimeout: true }

command config.$cmd command: update { update: "system.sessions", ordered: false, allowImplicitCollectionCreation: false, writeConcern: { w: "majority", wtimeout: 15000 }, $db: "config" } numYi elds:0 reslen:782 locks:{ Global: { acquireCount: { r: 208, w: 208 } }, Database: { acquireCount: { w: 208 } }, Collection: { acquireCount: { w: 208 } } } protocol:op_msg 15030ms

the write concern settings are: w : 1.0, and timeout 0.
but, "writeConcernMajorityJournalDefault" is set to true

When i remove that member from the replica, the errors disappear.

I dont understand, the write concern is set to w:1, so how do i get a timeout?
Even if it was "majority", my secondary and primary are up, so i have 2/3 servers up, why would i get a timeout?

I understand that i get a timeout error because one of the members is down, i just not really understand why.

thanks!

Best Answer

Just to clarify in case someone stumbles upon this in the future.

MongoDB Replicasets need a majority vote to elect a primary. So uneven amount of nodes are advised because a 4 member replica set requires 3 votes as well as a 5 member replica set.
But with a 5 member replica set 2 members can go down while with 4 only 1 can go down before the replica set enters a read only state.

The issue of OP is a different one though, it is about write concern majority. Write concern majority requires a majority vote of all the databearing nodes in a replicaset. An arbiter is not a databearing node and thous can't vote on it but will still be counted for the total amount of votes required.

Example: In OPs deployment there are currently 2 Databearing nodes and an arbiter.
If he adds another databearing node and does a transaction that requires a write concern majority vote to pass he can't get a majority vote because the total amount of members is 4 and thous the majority is 3.
Since the third databearing node is still syncing it can't pass an OK on the write concern and that forces it to time out.
If he were to remove the arbiter and add the databearing node after it should finish syncing and be able to vote on the write concern after. In the meantime he should not recieve any more errors because the total amount of votes required in a PSS deployment (Primary Secondary Secondary) is 2 and since there are 2 healthy data bearing nodes the write concern majority query will pass.