There are a couple of separate points here, but I don't think how MongoDB stores data in RAM is really relevant here - MongoDB just uses the mmap()
call and then lets the kernel take care of the memory management (the Linux kernel will use Least Recently Used (LRU) by default to decide what to page out and what to keep - there are more specifics to that but it's not terribly relevant).
In terms of your issues, it sounds like you might have had a corrupt index, though the evidence is somewhat circumstantial. Now that you have done a repair (the validate() command would have confirmed/denied beforehand), there won't be any evidence in the current data but you may find more evidence in the logs, particularly when you were attempting to recreate the index, or using the index in queries.
As for the spikes in the page faults, btree stats, journal, lock percentage, and average flush time, that has all the hallmarks of a bulk delete that causes a lot of index updates, and causes a large amount of IO. The fact that mapped memory drops off later in the graphs would suggest that once you ran the repair the storage size was significantly reduced, which usually indicates significant fragmentation (bulk deletes, along with updates that grow documents are the leading causes of fragmentation).
Therefore, I would look for a large delete operation logged as slow in the logs - it will only be logged once complete, so look for it to appear after the end of the events in MMS. One of the quirks of not running in a replica set is that a bulk operation like this is relatively non-obvious - it shows up as a single delete operation in the MMS graphs (usually lost in the noise).
These bulk delete operations usually tend to be run on older data that has not been recently used and has hence been paged out of active memory by the kernel (LRU again). To delete it you must page it all the data back in, then flush the changes to disk, and of course deletes require the write lock, hence the spikes in faults, lock percentage etc.
To make room for the deleted data, your current working set is paged out, which will hit performance on your normal usage until the deletes complete and the memory pressure eases.
FYI - when you run a replica set, bulk ops are serialized in the oplog
and hence replicated one at a time - as such you can track such operations by their footprints in the replicated ops stats of the secondaries. This is not possible with a standalone instance (without looking in the logs for the completed ops) and other secondary indications.
As for managing large deletes in the future, it is generally far more efficient to partition your data into separate databases (if possible) and then drop the old data when it is no longer needed by simply dropping the old databases. This requires some extra management on the application side but it negates the need for bulk deletes, is far quicker to complete, limits fragmentation, and dropped databases also remove the files on disk, preventing excessive storage use. Definitely recommended if possible with your use case.
That query is simply a long running read, without any criteria (so it is running against all data). As it fetches back the data, it will be done in batches (based on your batch size) and then issue a getmore
on the same cursor for the next set of results.
The numYields
count does not mean the query is being blocked, it means that it yielded its lock when needed. This is usually done for a write, and usually when the original query had to page fault to disk to get data, then it resumes (when querying all data in a collection, this is going to happen often unless all your data + indexes fit in RAM).
Therefore, the query is not being blocked, in fact the getmore
operations show that it is progressing over time - most long running reads will have a similar profile, especially if you are writing to the database at the same time.
It is also not likely that this query is the cause of any crash (it's just a read), it's more likely something else that is causing the crash, and you are equating this query with the crash because it happens to be running at the time when the crash occurs (people often suspect the serverStatus
command for the same reason - it is run once a minute by MMS). I would recommend posting the full messaging around the crash as a separate question for proper diagnosis.
For what it's worth, with snapshot
set to true, and the fact that it is reading all data, I suspect this is a mongodump
query (it defaults to using snapshot to avoid duplicates being dumped when data is moved).
Best Answer
In general you should consider the
repairDatabase
command a last resort to be used when you actually need to salvage or repair a database and don't have a better data source to copy from (eg. a copy of the data from another replica set member of backup). Historically the repair command was often (ab)used to forcibly rebuild databases using the older MMAP storage engine, but this is no longer recommended practice. The repair operation obtains a global write lock and will block all other operations on your MongoDB server until complete.With MongoDB 3.2.3+ using WiredTiger (the default storage engine), you can use the
compact
command to release unused disk space to the operating system for a specific collection. Typically this isn't required unless you find space is not being reused effectively over time.NOTE: while compacting a collection other operations on the same database will be blocked so this is an activity you would only want to perform during scheduled maintainance periods. Depending on the layout of data on disk, compaction may also not result in significant storage space savings. A better operational approach to reclaim all unused space while avoiding downtime would be to resync a
mongod
as a member of a replica set.A few points that should help clarify your observations on size numbers with WiredTiger:
storageSize
,totalIndexSize
, andtotalSize
(which addsstorageSize
+totalIndexSize
). You can inspect these metrics (and more) viadb.collection.stats()
.storageSize
and 8KBtotalSize
(the extra 4KB is the initial allocation for the_id
index file). Block allocations are variable sized and managed by the WiredTiger storage engine.show dbs
command indicates the actual data size (eg.db.mc.stats().size
), which may be much larger than thestorageSize
(due to compression) or smaller thanstorageSize
(due to preallocated/unused space). Your example is the latter: there is storage space available for reuse.db.collection.remove({})
. This syntax deletes matching documents without removing collection metadata (eg. index definitions) or compacting storage space. If you want to more immediately drop an entire collection and free storage space you should instead usedb.collection.drop()
.For more information, see: