Mongodb – Mongo slow startup after upgrade to 4.2.5

awsmongodbmongodb-4.2

I have upgraded the MongoDB replica set, which consists of 3 members, from 4.0.11 to 4.2.5.

After upgrading, startup lasts about 5 minutes.

Before upgrading it was instant.

It is related to oplog size, because I tested with dropping oplog on new mongo 4.2 and startup was instant.

Max oplog size was 25GB, I decreased it to 5GB and the startup is still slow.

Mongo db is on AWS with EBS standard disks.

However mongo worked well until this upgrade.

Here is the part of the log output:

2020-04-07T16:36:19.391+0000 I  STORAGE  [initandlisten] Detected data files in /var/lib/mongodb created by the 'wiredTiger' storage engine, so setting the active storage engine to 'wiredTiger'.
2020-04-07T16:36:19.391+0000 I  STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=7279M,cache_overflow=(file_max=0M),session_max=33000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000,close_scan_interval=10,close_handle_minimum=250),statistics_log=(wait=0),verbose=[recovery_progress,checkpoint_progress],
2020-04-07T16:36:19.931+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277379:931910][29458:0x7f2d17521b40], txn-recover: Recovering log 2439 through 2440
2020-04-07T16:36:20.002+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:2298][29458:0x7f2d17521b40], txn-recover: Recovering log 2440 through 2440
2020-04-07T16:36:20.086+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:86290][29458:0x7f2d17521b40], txn-recover: Main recovery loop: starting at 2439/15997696 to 2440/256
2020-04-07T16:36:20.087+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:87145][29458:0x7f2d17521b40], txn-recover: Recovering log 2439 through 2440
2020-04-07T16:36:20.140+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:139996][29458:0x7f2d17521b40], txn-recover: Recovering log 2440 through 2440
2020-04-07T16:36:20.189+0000 I  STORAGE  [initandlisten] WiredTiger message [1586277380:189056][29458:0x7f2d17521b40], txn-recover: Set global recovery timestamp: (1586277366, 1)
2020-04-07T16:36:20.197+0000 I  RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1586277366, 1)
2020-04-07T16:36:20.205+0000 I  STORAGE  [initandlisten] Starting OplogTruncaterThread local.oplog.rs
2020-04-07T16:36:20.205+0000 I  STORAGE  [initandlisten] The size storer reports that the oplog contains 260431 records totaling to 5145508040 bytes
2020-04-07T16:36:20.205+0000 I  STORAGE  [initandlisten] Sampling the oplog to determine where to place markers for truncation
2020-04-07T16:36:20.207+0000 I  STORAGE  [initandlisten] Sampling from the oplog between Apr  6 14:15:08:222 and Apr  7 16:36:06:1 to determine where to place markers for truncation
2020-04-07T16:36:20.207+0000 I  STORAGE  [initandlisten] Taking 981 samples and assuming that each section of oplog contains approximately 2654 records totaling to 52436838 bytes
2020-04-07T16:38:48.156+0000 I  STORAGE  [initandlisten] Placing a marker at optime Apr  6 14:15:16:84
…..
2020-04-07T16:38:48.156+0000 I  STORAGE  [initandlisten] Placing a marker at optime Apr  7 16:29:31:13
2020-04-07T16:38:48.156+0000 I  STORAGE  [initandlisten] WiredTiger record store oplog processing took 147951ms
2020-04-07T16:38:48.160+0000 I  STORAGE  [initandlisten] Timestamp monitor starting

The problem is when WiredTiger starts to count samples from oplog. I decreased oplog size to 5GB, and still has a lag when starting, more than 2 minutes. It was above 5 minutes when max oplog size was 25GB.
Before upgrading to 4.2, oplog size was also 25GB, but startup was instant.
Do you have any idea what could cause slow startup? Thanks.

Best Answer

I tried with changing following 3 WiredTiger default eviction parameters:

storage.wiredTiger.engineConfig.configString:
"eviction_dirty_target=60,
eviction_dirty_trigger=80,eviction=(threads_min=4,threads_max=4)"

Now mongo is starting immediately. Is it safe to set eviction_dirty_target and eviction_dirty_trigger values like this? Default is : eviction_dirty_target (default 5%) and eviction_dirty_trigger (default 20%). Thanks.