MongoDB Too Many Open Files Error – How to Fix

mongodb

we suddenly got MongoDB servers crashing on docker container mongo:4.4 with this error:

couldn't open [/proc/1/stat] Too many open files in system","file":"src/mongo/util/processinfo_linux.cpp","line":78}

you can see the full log here:

[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:47:43.080+00:00"},"s":"I",  "c":"COMMAND",  "id":51803,   "ctx":"conn60","msg":"Slow query","attr":{"type":"command","ns":"myDB.user","command":{"aggregate":"user","pipeline":[{"$match":{"__t":"user"}},{"$unwind":{"path":"$security.tokens.learner","preserveNullAndEmptyArrays":true}},{"$unwind":{"path":"$security.tokens.cp","preserveNullAndEmptyArrays":true}},{"$unwind":{"path":"$security.tokens.admin","preserveNullAndEmptyArrays":true}},{"$match":{"$or":[{"$or":[{"$and":[{"trash.status":false},{"$or":[{"learner.lock.status":false},{"learner.lock.status":{"$exists":false}}]},{"learner.suspend.status":false},{"security.tokens.learner.token":"asdjasdjkasdasdbashd"},{"security.tokens.learner.expireDate":{"$gte":{"$date":"2020-12-01T12:47:42.977Z"}}},{"shared.roles.learner":true}]},{"$and":[{"security.tokens.learner.token":"asdjasdjkasdasdbashd"},{"security.tokens.learner.expireDate":{"$gte":{"$date":"2020-12-01T12:47:42.977Z"}}},{"$or":[{"learner.lock.status":false},{"learner.lock.status":{"$exists":false}}]},{"learner.suspend.status":false},{"$or":[{"facebook.registered":true},{"google.registered":true},{"linkedin.registered":true}]}]}]},{"$and":[{"trash.status":false},{"$or":[{"cp.lock.status":false},{"cp.lock.status":{"$exists":false}}]},{"cp.suspend.status":false},{"security.tokens.cp.token":"asdjasdjkasdasdbashd"},{"security.tokens.cp.expireDate":{"$gte":{"$date":"2020-12-01T12:47:42.977Z"}}},{"shared.roles.cp":true}]},{"$and":[{"trash.status":false},{"$or":[{"admin.lock.status":false},{"admin.lock.status":{"$exists":false}}]},{"admin.suspend.status":false},{"security.tokens.admin.token":"asdjasdjkasdasdbashd"},{"security.tokens.admin.expireDate":{"$gte":{"$date":"2020-12-01T12:47:42.977Z"}}},{"shared.roles.admin":true}]}]}}],"cursor":{},"$db":"myDB"},"planSummary":"COLLSCAN","keysExamined":0,"docsExamined":1569,"cursorExhausted":true,"numYields":2,"nreturned":1,"queryHash":"167A82D6","planCacheKey":"C324EA6F","reslen":4753,"locks":{"ReplicationStateTransition":{"acquireCount":{"w":5}},"Global":{"acquireCount":{"r":5}},"Database":{"acquireCount":{"r":5}},"Collection":{"acquireCount":{"r":5}},"Mutex":{"acquireCount":{"r":3}}},"storage":{},"protocol":"op_msg","durationMillis":101}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.000+00:00"},"s":"E",  "c":"-",        "id":23077,   "ctx":"ftdc","msg":"Assertion","attr":{"error":"Location13538: couldn't open [/proc/1/stat] Too many open files in system","file":"src/mongo/util/processinfo_linux.cpp","line":78}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"thread68","msg":"WiredTiger error","attr":{"error":23,"message":"[1606827505:339548][1:0x7f0bbd7e2700], log-server: __directory_list_worker, 46: /data/db/journal: directory-list: opendir: Too many open files in system"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"thread68","msg":"WiredTiger error","attr":{"error":23,"message":"[1606827505:339753][1:0x7f0bbd7e2700], log-server: __log_prealloc_once, 505: log pre-alloc server error: Too many open files in system"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"thread68","msg":"WiredTiger error","attr":{"error":23,"message":"[1606827505:339773][1:0x7f0bbd7e2700], log-server: __log_server, 961: log server error: Too many open files in system"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"E",  "c":"STORAGE",  "id":22435,   "ctx":"thread68","msg":"WiredTiger error","attr":{"error":-31804,"message":"[1606827505:339785][1:0x7f0bbd7e2700], log-server: __log_server, 961: the process must exit and restart: WT_PANIC: WiredTiger library panic"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"F",  "c":"-",        "id":23089,   "ctx":"thread68","msg":"Fatal assertion","attr":{"msgid":50853,"file":"src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp","line":520}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"F",  "c":"-",        "id":23090,   "ctx":"thread68","msg":"\n\n***aborting after fassert() failure\n\n"}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.339+00:00"},"s":"F",  "c":"CONTROL",  "id":4757800, "ctx":"thread68","msg":"Writing fatal message","attr":{"message":"Got signal: 6 (Aborted).\n"}}
[36mdb_1              |[0m {"t":{"$date":"2020-12-01T12:58:25.350+00:00"},"s":"E",  "c":"CONTROL",  "id":31430,   "ctx":"thread68","msg":"Error collecting stack trace","attr":{"error":"unw_get_proc_name(55CF8D21B921): unspecified (general) error\nerror: unw_step: unspecified (general) error\nunw_get_proc_name(55CF8D21B921): unspecified (general) error\nerror: unw_step: unspecified (general) error\n"}}

this is the result ulimit -a inside the mongo container:

$ ulimit -a
core file size          (blocks, -c) unlimited
data seg size           (kbytes, -d) unlimited
scheduling priority             (-e) 0
file size               (blocks, -f) unlimited
pending signals                 (-i) 31806
max locked memory       (kbytes, -l) 16384
max memory size         (kbytes, -m) unlimited
open files                      (-n) 90000
pipe size            (512 bytes, -p) 8
POSIX message queues     (bytes, -q) 819200
real-time priority              (-r) 0
stack size              (kbytes, -s) 8192
cpu time               (seconds, -t) unlimited
max user processes              (-u) unlimited
virtual memory          (kbytes, -v) unlimited
file locks                      (-x) unlimited

How do I fix this?

Best Answer

I found the problem which is not related to mongoDB. I had a file descriptor leak on another process and it sucked our resources. that's why MongoDB didn't have enough resources to work properly.

first of all, I checked file descriptors of all process with this command (for more information about what file descriptor is read this link https://stackoverflow.com/a/5256705/7339000):

cd /proc
for pid in [0-9]*
do
    echo "PID = $pid with $(ls /proc/$pid/fd/ | wc -l) file descriptors"
done

then I realized we had a node process which have more than 40000 file descriptors which was unusual. in fact we had a file descriptor leak. when we fixed that problem, we didn't encounter any issue with MongoDB anymore.