MongoDB fails with SymInitialize error unless there is a very large Page File in Windows

mongodbvirtualisationwindows-server

I'm wondering if someone can help with a MongoDB/page-file issue we seem to experience as our database increases in size over time. Every now and then MongoDB will crash on a particular node in our replica set and it won't start again until we significantly increase (double is safe) our page file size. Currently, the page file is 42GB. We are running MongoDB as a 3 node replica set and each node is running on a Windows Server 2012 R2 virtual machine in Azure. Each server has 3.5 GB of memory. MongoDB is version 2.6.5.

I've seen the posts below, which are related, and I understand that MongoDB uses memory mapped files and presumably virtual memory helps with that when we haven't got lots of RAM. What I don't understand is:

Why is MongoDB requiring so much memory on start-up (>32GB page file for a 131GB database) and with a relatively small working set (~100MB)? Presumably it can swap out the files as it needs, especially with a page file this large, so why is MongoDB crashing?

Here are the posts I've found so far:

mongodb memory usage is going high even if only insertions are made

and this one

SERVER-10044 which explains why Mongo crashes and implies VM's are worse

Thanks in advance for any help.

To provide more context we are using MongoDB to log data, so most of the collections are written to but rarely read from, with the exception of a few small collections (100MB total) which are subject to constant reads and writes. The data is stored in a single Mongo DB database, the stats for which are shown below (db and collection names modified):

    "db" : "MyDatabase",
    "collections" : 854,
    "objects" : 243025868,
    "avgObjSize" : 541.2304596809423,
    "dataSize" : 131533002252,
    "storageSize" : 172592721920,
    "numExtents" : 7268,
    "indexes" : 1934,
    "indexSize" : 27824138048,
    "fileSize" : 210284576768,
    "nsSizeMB" : 16,
    "dataFileVersion" : {
            "major" : 4,
            "minor" : 5
    },
    "extentFreeList" : {
            "num" : 3,
            "totalSize" : 110592
    },
    "ok" : 1

The working set appears to be around the 100 MB mark, as illustrated below:

   "workingSet" : {
           "note" : "thisIsAnEstimate",
           "pagesInMemory" : 20874,
           "computationTimeMicros" : 26236,
           "overSeconds" : 876
   },

The log file output on the secondary that most recently failed is as follows (this occurred during start-up although the first time it failed was during normal operation):


2014-11-25T09:25:17.833+0000 [rsBackgroundSync] replSet syncing to:
10.1.6.71:27017 2014-11-25T09:25:17.833+0000 [rsBackgroundSync] replset setting syncSourceFeedback to 10.1.6.71:27017
2014-11-25T09:25:17.849+0000 [rsSync] replSet still syncing, not yet
to minValid optime 54744561:c 2014-11-25T09:25:18.286+0000 [rsSync]
replSet SECONDARY 2014-11-25T09:26:01.590+0000 [conn21] serverStatus
was very slow: { after basic: 0, after asserts: 0, after
backgroundFlushing: 0, after connections: 0, after cursors: 0, after
dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters:
0, after locks: 0, after network: 0, after opcounters: 0, after
opcountersRepl: 0, after oplog: 10451, after recordStats: 10451, after
repl: 10451, at end: 10451 } 2014-11-25T09:26:01.590+0000 [conn21]
command admin.$cmd command: serverStatus { serverStatus: 1, oplog: 1 }
keyUpdates:0 numYields:0 locks(micros) r:65 reslen:4028 16764ms
2014-11-25T09:26:31.155+0000 [DataFileSync] flushing mmaps took
15022ms for 115 files 2014-11-25T09:26:47.501+0000 [conn5]
serverStatus was very slow: { after basic: 0, after asserts: 0, after
backgroundFlushing: 0, after connections: 0, after cursors: 0, after
dur: 0, after extra_info: 0, after globalLock: 0, after indexCounters:
0, after locks: 0, after network: 0, after opcounters: 0, after
opcountersRepl: 0, after oplog: 4791, after recordStats: 4791, after
repl: 4791, at end: 4791 } 2014-11-25T09:26:47.501+0000 [conn5]
command admin.$cmd command: serverStatus { serverStatus: 1, oplog: 1 }
keyUpdates:0 numYields:0 locks(micros) r:88 reslen:4028 7674ms
2014-11-25T09:27:06.350+0000 [repl writer worker 6] VirtualProtect
for m:/mongodb/data/MyDatabase.72 chunk 21220 failed with errno:1455
The paging file is too small for this operation to complete.
(chunk
size is 67108864, address is 14b90000000) in mongo::makeChunkWritable,
terminating 2014-11-25T09:27:06.350+0000 [repl writer worker 6]
MyDatabase.RC_PUR_11_456754 Fatal Assertion 16362
2014-11-25T09:27:06.615+0000 [repl writer worker 6] Stack trace
failed, SymInitialize failed with error 3765269347
2014-11-25T09:27:06.615+0000 [repl writer worker 6]
MyDatabase.RC_PUR_11_456754 2014-11-25T09:27:06.615+0000 [repl writer
worker 6]

***aborting after fassert() failure

Best Answer

Under Windows, in a worst case scenario, your pagefile size might have to be set to the size of your data files + physical memory size. So if your data files take up 50GB on disk, the rough guidance, in your case, is to set pagefile size to 53.5GB. This will improve with MongoDB 2.8 release since the new storage engine does not rely on virtual memory services provided by the OS. On a related subject, your memory size of 3.5GB sounds very low. Take a look at the Hard Page Faults per second under the Resource Monitor -- if the number is in hundreds, you need to dramatically increase your memory size