MongoDB : Restoring space after db.repairDatabase

mongodb

This is my first post in this forum and also I am very new to mongoDB. Before posting I searched for existing posts relating to repairDatabase but everything looks in advanced stage. My query is very basic.

If you look below series of simple operations i.e. insert documents and deleting documents and finding the occupied size of collection.

I am logged into mongo and this is the start of my screen:

2016-12-16T12:17:17.521+0000 I CONTROL  [main] Hotfix KB2731284 or later update is not installed, will zero-out data files
MongoDB shell version: 3.2.9
connecting to: 127.0.0.1:27017/test

> show dbs
local  0.000GB

> db
test

> show collections

>// No collection at this moment (I entered this text here and not from shell screen)

> use mc
switched to db mc

> db.mc.storageSize()
> // Nothing returned 

> db.mc.totalSize()
> // Nothing returned 

> db.mc.insert({x:1})
WriteResult({ "nInserted" : 1 })

> db.mc.storageSize()
4096

> db.mc.totalSize()
8192

> db.mc.insert({x:2})
WriteResult({ "nInserted" : 1 })

> db.mc.storageSize()
4096 // Same as above

> db.mc.totalSize()
8192 // Same as above 

> for(i=1;i<100;i++){db.mc.insert({y:i})} ;
WriteResult({ "nInserted" : 1 })

> db.mc.count()
101 

> db.mc.storageSize()
16384 // Size increased

> db.mc.totalSize()
32768 // Size increased

> db.mc.remove({})
WriteResult({ "nRemoved" : 101 })

> db.mc.count()
0 

> db.mc.storageSize()
20480 // Size further increased

> db.mc.totalSize()
40960 // Size further increased 

> db.repairDatabase()
{ "ok" : 1 }

> db.mc.storageSize()
20480 // Same after repairDatabase 

> db.mc.totalSize()
24576 // Here size decreased after repairDatabase 

> show dbs
local  0.000GB
mc     0.000GB

If I observe the numbers produced after above operations, I was under assumption that size should go back to empty, though show dbs shows mc as 0.000GB.

I am bit confused about the size before and after the repairDatabase. I was thinking space should get release. I haven't tried compact option

It would be great if anyone monitoring this forum could give their thoughts.

Best Answer

I am bit confused about the size before and after the repairDatabase. I was thinking space should get release. I haven't tried compact option

In general you should consider the repairDatabase command a last resort to be used when you actually need to salvage or repair a database and don't have a better data source to copy from (eg. a copy of the data from another replica set member of backup). Historically the repair command was often (ab)used to forcibly rebuild databases using the older MMAP storage engine, but this is no longer recommended practice. The repair operation obtains a global write lock and will block all other operations on your MongoDB server until complete.

With MongoDB 3.2.3+ using WiredTiger (the default storage engine), you can use the compact command to release unused disk space to the operating system for a specific collection. Typically this isn't required unless you find space is not being reused effectively over time.

NOTE: while compacting a collection other operations on the same database will be blocked so this is an activity you would only want to perform during scheduled maintainance periods. Depending on the layout of data on disk, compaction may also not result in significant storage space savings. A better operational approach to reclaim all unused space while avoiding downtime would be to resync a mongod as a member of a replica set.

A few points that should help clarify your observations on size numbers with WiredTiger:

  • Data in WiredTiger is compressed by default and the representation on disk is different from the representation in memory.
  • There are several potentially interesting measures of size including data size, storageSize, totalIndexSize, and totalSize (which adds storageSize + totalIndexSize). You can inspect these metrics (and more) via db.collection.stats().
  • Data written to files on disk is grouped into block allocations (in multiples of 4KB) which is why your single document insertion resulted in a 4KB storageSize and 8KB totalSize (the extra 4KB is the initial allocation for the _id index file). Block allocations are variable sized and managed by the WiredTiger storage engine.
  • The WiredTiger storage engine does not perform in-place updates on data; it uses an MVCC (MultiVersion Concurrency Control) approach with Snapshots and Checkpoints. This can lead to some apparently contradictory observations where storage space may need to grow to accommodate multiple document versions even though the data size is not increasing. Allocations from older document versions will be marked available for reuse as checkpoints complete, and WiredTiger will try to minimize storage fragmentation with a best-fit allocation from existing storage where possible.

If I observe the numbers produced after above operations, I was under assumption that size should go back to empty, though show dbs shows mc as 0.000GB.

  • The show dbs command indicates the actual data size (eg. db.mc.stats().size), which may be much larger than the storageSize (due to compression) or smaller than storageSize (due to preallocated/unused space). Your example is the latter: there is storage space available for reuse.
  • When you deleted documents you used db.collection.remove({}). This syntax deletes matching documents without removing collection metadata (eg. index definitions) or compacting storage space. If you want to more immediately drop an entire collection and free storage space you should instead use db.collection.drop().

For more information, see: