Mongodb 2.4.1 instance shutdown without log

crashmongodbmongodb2.4shutdown

I am running mongodb in a replica set of 3 nodes (no sharding). But since 5 days the secondary instance shuts down intermittently, once in 2 days.

Log does not indicate any event related to shutdown, but shows error related to secondary running TTL expiry.

Thu Dec 26 16:28:25.404 [conn12625]  authenticate db: local { authenticate: 1, nonce: "**", user: "__system", key: "***" }
Thu Dec 26 16:28:25.404 [conn12626]  authenticate db: local { authenticate: 1, nonce: "**", user: "__system", key: "***" }
        Thu Dec 26 16:28:29.652 [TTLMonitor] Assertion: 13312:replSet error : logOp() but not primary?
0xdc7f71 0xd8963b 0xa63ca3 0xa60f69 0xa72fd4 0xc3d4c1 0xc3e725 0xd8c233 0xd8cce4 0xe10879 0x35040079d1 0x35038e88fd
 /MONGO/dir/product/bin/mongod(_ZN5mongo15printStackTraceERSo+0x21) [0xdc7f71]
 /MONGO/dir/product/bin/mongod(_ZN5mongo11msgassertedEiPKc+0x9b) [0xd8963b]
 /MONGO/dir/product/bin/mongod() [0xa63ca3]
 /MONGO/dir/product/bin/mongod(_ZN5mongo5logOpEPKcS1_RKNS_7BSONObjEPS2_Pbb+0x49) [0xa60f69]
 /MONGO/dir/product/bin/mongod(_ZN5mongo13deleteObjectsEPKcNS_7BSONObjEbbbPNS_11RemoveSaverE+0x10d4) [0xa72fd4]
 /MONGO/dir/product/bin/mongod(_ZN5mongo10TTLMonitor10doTTLForDBERKSs+0xfe1) [0xc3d4c1]
 /MONGO/dir/product/bin/mongod(_ZN5mongo10TTLMonitor3runEv+0x345) [0xc3e725]
 /MONGO/dir/product/bin/mongod(_ZN5mongo13BackgroundJob7jobBodyEN5boost10shared_ptrINS0_9JobStatusEEE+0xc3) [0xd8c233]
 /MONGO/dir/product/bin/mongod(_ZN5boost6detail11thread_dataINS_3_bi6bind_tIvNS_4_mfi3mf1IvN5mongo13BackgroundJobENS_10shared_ptrINS7_9JobStatusEEEEENS2_5list2INS2_5valueIPS7_EENSD_ISA_EEEEEEE3runEv+0x74) [0xd8cce4]
 /MONGO/dir/product/bin/mongod() [0xe10879]
 /lib64/libpthread.so.0() [0x35040079d1]
 /lib64/libc.so.6(clone+0x6d) [0x35038e88fd]
Thu Dec 26 16:28:29.655 [TTLMonitor] ERROR: error processing ttl for db: my_db 13312 replSet error : logOp() but not primary?

There is a mongodb bug reported in ver 2.4/2.4.1 about this error.
https://jira.mongodb.org/browse/SERVER-9261

My question is will this error ultimately cause a replica to crash?

Note: Due to some setting I'm unable to get the core dump of crash.

UPDATE
On Primary node which also shutdown, I studied the /var/log/messages of that date. Found below logs

kernel: INFO: task mongod:11514 blocked for more than 120 seconds.
Dec 28 03:24:33 <hostname> kernel: "echo 0 > 
/proc/sys/kernel/hung_task_timeout_secs" disables this message.
Dec 28 03:24:33 <hostname> kernel: mongod        D 0000000000000001     
0 11514      1 0x00000080

It seems like the mongod process is being blocked for more than 120 seconds during flush of cache. And during this block the mongod instance is assumed to be shutdown

Best Answer

It looks like the assertion you are encountering is SERVER-9053: TTL index asserts on 2.4 secondary, which was fixed in 2.4.2. This appears to be a race condition where the TTL monitor thread would incorrectly try to apply TTL deletes on a secondary. The assertion does not appear to be fatal; if it was there should be a "Fatal assertion" logged followed by the mongod process shutting down.

Unexplained shutdowns without MongoDB log messages are commonly the result of something external to mongod, such as the Out-of-Memory (OOM) Killer on Linux. You can usually find more info on activity such as the OOM Killer by searching your system logs.

Note: MongoDB 2.4.1 was released in April, 2013 and the 2.4 release series reached end of life in March, 2016. I'd strongly recommend upgrading to the final 2.4.14 release and then planning an upgrade to a supported version.

Best Answer

Related Solutions

Mongodb – mongo replicated shard member not able to recover, stuck in STARTUP2 mode

Instance Startup Modes & Shutdown Modes

Related Question