MongoDB secondaries become unresponsive during replication

mongodbmongodb-3.4wiredtiger

We got 3 Servers running. One Primary, two as secondaries. Primary has 4 vcpus, 16 GB memory, both secondaries have 8 vcpus, 64 GB memory.

Every night, we run a full sync with several large collections on the primary with multiple threads.

During that sync, both secondaries become unavailable from time to time.
mongod.log states following notice:

serverstatus was very slow: { after basic: 0, after asserts: 0, after 
backgroundFlushing: 0, after connections: 0, after dur: 0, after extra_info: 
0, after globalLock: 0, after locks: 0, after network: 0, after opLatencies: 
0, after opcounters: 0, after opcountersRepl: 0, after repl: 0, after 
security: 0, after storageEngine: 0, after tcmalloc: 0, after wiredTiger: 
4992, at end: 4992 }

Mongostat during that time states:

Our clients have readPreference set to secondaries only, but we dont have much connection during that time tough.

Standard is default, so default mongodb config with no special tweaks.

So the only thing is see is, that the mongodb log states an "after wiredTiger" message with an higher amount of time. Any clue what's happening here?

Used mongoDB Version is 3.4.16

Best Answer

What you are observing is an expected effect of replication batches being applied on your MongoDB 3.4 secondaries. The original design intent of replica sets was to provide data redundancy and failover, so efficient application of replicated writes takes priority over secondary reads. Replicated writes are applied on secondaries in multithreaded batches organised for better write throughput, which means that the order of writes may temporarily diverge from the primary. To avoid out-of-order secondary reads during critical sections of write activity, read operations will periodically have to wait while a replication batch is being applied.

This will have a more noticeable impact during periods of high replication activity or where your secondary is generally having challenges keeping up with write activity.

The items you've highlighted are expected in MongoDB replica sets prior to 4.0:

Intermittent increased latency for serverStatus commands or read operations on secondaries.
Since mongostat collects serverStatus output on a timed interval, the "no data received" indicates that a result wasn't returned before the next reporting interval.

Improvements in subsequent releases of MongoDB have enabled introduction of non-blocking secondary reads in MongoDB 4.0. If you rely on secondary reads and are impacted by latency under write load, it would definitely be worth planning an upgrade. For more information about the changes in MongoDB 4.0, see: Scaling Your Replica Set: Non-Blocking Secondary Reads in MongoDB 4.0.

For some general caveats on using secondaries for read scaling, see: Can I use more replica nodes to scale?.

Related Solutions

MongoDB fails with SymInitialize error unless there is a very large Page File in Windows

Under Windows, in a worst case scenario, your pagefile size might have to be set to the size of your data files + physical memory size. So if your data files take up 50GB on disk, the rough guidance, in your case, is to set pagefile size to 53.5GB. This will improve with MongoDB 2.8 release since the new storage engine does not rely on virtual memory services provided by the OS. On a related subject, your memory size of 3.5GB sounds very low. Take a look at the Hard Page Faults per second under the Resource Monitor -- if the number is in hundreds, you need to dramatically increase your memory size

Mongodb high lock percentage / slow queries

To see if the hardware is not limiting:

top/htop => cpu percentage
iostat -x 1 => sysstat tool to see disk r/w limits (%util)

Concerning locking:

Mongo 2.6 : database locking
Mongo 3.0 + MMAPv1 storage engine : collection locking
Mongo 3.0 + WiredTiger storage engine : document locking

If you have 1 huge collection (server-prod), maybe Sharding is an option to distribute the load, or more cores + less locking with Mongo3.0

Improve indexes: - More indexes = slower write + faster read - Less indexes = faster write + slower read

Read from Secondaries, Only write on Primary.

> db.setProvilingLevel(1,4)  ##  save slow logs for that db slower than 4ms
> db.system.profile.find({millis:{$gt:100}}).sort({ts:-1}) ## find queries slower than 100ms, order by timestamp descending
> ....query.explain()  ## find out which indexes it uses

Information: http://docs.mongodb.org/manual/administration/optimization/

Best Answer

Related Solutions

MongoDB fails with SymInitialize error unless there is a very large Page File in Windows

Mongodb high lock percentage / slow queries

Related Question