MongoDB secondaries become unresponsive during replication

mongodbmongodb-3.4wiredtiger

We got 3 Servers running. One Primary, two as secondaries. Primary has 4 vcpus, 16 GB memory, both secondaries have 8 vcpus, 64 GB memory.

Every night, we run a full sync with several large collections on the primary with multiple threads.

During that sync, both secondaries become unavailable from time to time.
mongod.log states following notice:

serverstatus was very slow: { after basic: 0, after asserts: 0, after 
backgroundFlushing: 0, after connections: 0, after dur: 0, after extra_info: 
0, after globalLock: 0, after locks: 0, after network: 0, after opLatencies: 
0, after opcounters: 0, after opcountersRepl: 0, after repl: 0, after 
security: 0, after storageEngine: 0, after tcmalloc: 0, after wiredTiger: 
4992, at end: 4992 } 

Mongostat during that time states:

Picture of Mongostat statistics

Our clients have readPreference set to secondaries only, but we dont have much connection during that time tough.

Standard is default, so default mongodb config with no special tweaks.

So the only thing is see is, that the mongodb log states an "after wiredTiger" message with an higher amount of time. Any clue what's happening here?

Used mongoDB Version is 3.4.16

Best Answer

What you are observing is an expected effect of replication batches being applied on your MongoDB 3.4 secondaries. The original design intent of replica sets was to provide data redundancy and failover, so efficient application of replicated writes takes priority over secondary reads. Replicated writes are applied on secondaries in multithreaded batches organised for better write throughput, which means that the order of writes may temporarily diverge from the primary. To avoid out-of-order secondary reads during critical sections of write activity, read operations will periodically have to wait while a replication batch is being applied.

This will have a more noticeable impact during periods of high replication activity or where your secondary is generally having challenges keeping up with write activity.

The items you've highlighted are expected in MongoDB replica sets prior to 4.0:

  • Intermittent increased latency for serverStatus commands or read operations on secondaries.
  • Since mongostat collects serverStatus output on a timed interval, the "no data received" indicates that a result wasn't returned before the next reporting interval.

Improvements in subsequent releases of MongoDB have enabled introduction of non-blocking secondary reads in MongoDB 4.0. If you rely on secondary reads and are impacted by latency under write load, it would definitely be worth planning an upgrade. For more information about the changes in MongoDB 4.0, see: Scaling Your Replica Set: Non-Blocking Secondary Reads in MongoDB 4.0.

For some general caveats on using secondaries for read scaling, see: Can I use more replica nodes to scale?.