Mongodb – Get size of a mongo database in a clustered enviroment

mongodb

As far as I can see there is a command to get the size of the database by using the db.stats() command. My question: is there a way to get the total size of all nodes in the cluster for a specific database?

Best Answer

AFAIK, no. You'll have to write a script around db.stats().

Related Solutions

Mongodb – Mongo Collection `Size` is larger than `storageSize`

storageSize is the sum of all extents for that data, excluding indexes.

So that collection takes up 2 extents, they are ~2GB each, hence ~4GB. size includes indexes and I believe a couple of other things which inflate the number. Neither really represents the proper on-disk size. For disk size, db.stats() has a filesize field which is closer to what you want I think you're looking for.

The manual is somewhat better at outlining what the various fields mean, see here for collections:

http://docs.mongodb.org/manual/reference/collection-statistics/

And here for database stats:

http://docs.mongodb.org/manual/reference/database-statistics/

Some other potentially relevant information:

The compact command does not shrink any datafiles; it only defragments deleted space so that larger objects might reuse it. The compact command will never delete or shrink database files, and in general requires extra space to do its work, usually a minimum of one extra extent.

If you repair the database it will essentially rewrite the data files from scratch, which will remove padding and store them on disk as efficiently as you are going to get. However you will need to have ~2x the size on disk to do so (actually less, but it's a decent guide).

One other thing to bear in mind here - repair and compact remove padding. The padding factor varies between 1 (no moves of documents caused by documents growing), to 2 (lots of moves caused by documents growing). Your padding factor of ~1.67 would indicate you are growing (and hence causing moves) quite a bit.

When you compact or repair a database you remove that padding - subsequent document growth is therefore going to trigger even more moves than before. Because moves are relatiely expensive operations, this can have a serious impact on your performance. More info here:

http://www.mongodb.org/display/DOCS/Padding+Factor

MongoDB Sharding – Using IP Addresses Instead of Hostnames

Generally it is not recommended to use IP addresses to configure a cluster. You have not mentioned your environment, but it is common for IP addresses to change in many environments with a reboot, or perhaps you will need to move one or more of the nodes to a different host in the future.

Should that happen, your IP address will change but if you use hostnames (even those in the hosts file rather than DNS) it gives you a layer of abstraction and means you can have the hostname remain the same, not have to alter your database configuration for such a move.

This becomes particularly important for config servers in a sharded environment. An altered IP address is procedurally the same as an altered host name, and as you can see from this procedure, that means that moving a config server will require downtime for your cluster.

With all that said, for testing or for an environment where you can tear everything down and start again (or similar), these concerns do not really apply and you can use IP addresses in your configuration. For any system that will run long term, or any system that will run in production and require minimal interruption, then the use of hostnames over IP addresses is certainly recommended.

Best Answer

Related Solutions

Mongodb – Mongo Collection `Size` is *larger* than `storageSize`

MongoDB Sharding – Using IP Addresses Instead of Hostnames

Related Question

Mongodb – Mongo Collection `Size` is larger than `storageSize`