Mongodb – Size of data in secondary exceeded primary after its initial resync from primary in MongoDB

data synchronizationmongodbreplication

I step down the primary node, deleted its data and performed the initial resync from new primary. But when i checked the size of data after the completion of initial resync, I found that the size of data in this node was 409G whereas size of data in other node from where this node has synced its data was 326G. Hence the size was increased by 83G after the resync.

Earlier also i have done such initial resync, and every time I have found that the size of data decreases after the resync. But in this case its opposite.

Recently I have upgraded the version of cluster from 2.4.9 to 2.6.7. This was my first initial resync after the version upgrade. So is this increase in size is because of the new MongoDB version. Does the same data takes more storage space in version 2.6 than in version 2.4 ?

Best Answer

Does the same data take more storage space in version 2.6?

As noted in the MongoDB 2.6 release notes, the default storage allocation strategy changed from using exact-fit allocation with padding factor to usePowerOfTwoSizes. The powerOfTwoSizes allocation may take up more initial space, but generally results in less storage fragmentation and better reuse of space from deleted documents.

One notable exception is if you are using GridFS and have data saved with older drivers. The original GridFS chunk size was 256KB (an exact power of 2) which resulted in rounding up to 512KB (the next power of 2). Drivers were updated to lower the GridFS chunk size to 255KB, but this is a common reason your database size to grow if you resync with the powerOf2 allocation (see SERVER-13331 in the MongoDB issue tracker).

Changing the allocation strategy

In MongoDB 2.6 you can change the allocation strategy on a collection level using the collMod command, or on a server level with the newCollectionsUsePowerOf2Sizes configuration parameter. If you change the allocation strategy for an existing collection, this will only affect new documents created or moved in storage after the change. You would have to resync, compact, or repair the database to rewrite all documents with new allocations.

If you are using GridFS and have older documents with the 256K chunk sizes, it would make sense to change the allocation strategy for any gridfs chunks collections to not use PowerOf2Sizes. Similarly, if your data use case for a collection happens to be insert-only (or does not grow in size via updates) then the exact fit allocation will be more efficient.

Otherwise, the powerOf2Sizes allocation is recommended.

WiredTiger storage engine supports compression in MongoDB 3.0+

If storage size on disk is a concern, you should also consider testing the new WiredTiger storage engine in MongoDB 3.0, which includes support for index & data compression. Upgrading to MongoDB 3.0 & WiredTiger will require you to use updated drivers & tools, and as with any major version upgrade you should review the upgrade procedures relevant for your deployment as well as any compatibility changes.