MongoDB Drive Size Ratios – Best Practices for Data, Journal, and Log

amazon ec2mongodb

Is there a good ratio for drive sizes for MongoDB?

For example if you have a 8Gig drive for the data, then how big should my drive for the log be? How about the drive for the journal?

Just a little more background I'm following this tutorial, which states:

According to this MongoDB tutorial which explains how to manually deploy MongoDB on EC2, one of the steps states that you should have:

"Individual PIOPS EBS volumes for data (1000 IOPS), journal (250 IOPS), and log (100 IOPS)."

Best Answer

You can combine it on 1 disk, if you wish. Not obligated to split.

Journal

Journal will take 3GB (or less than 400MB if you use --small-files option)

Journal + Pre-Allocation

Be aware. If you don't use --small-files, then at least 8GB (journal and oplog included) will be pre-allocated to your disk. This is not lost space, but just reserved to improve the speed of mongo. Using --small-files, only 1.4GB will be preallocated.

For discovering and testing purposes. Start with --small-files.

Logfiles

Logfiles will depend on the verbosity and insensitivity of the system. But for as you speak about a 8GB data disk. Then it won't be that much. Default is only some system messages and errors. (http://docs.mongodb.org/manual/reference/configuration-options/)

To let logfiles rotate, send "kill -SIGUSR1 pid" or mongo --port 27017 --eval "db.runCommand({logRotate:1});" admin (http://docs.mongodb.org/manual/reference/command/logRotate/). And then I delete daily via a crontab the logs older than 3 days.

Splitting on different drives

The reason is to optimize the disk for the purpose. Journal is a capped collection that just writes in sequence. So less IOPS needed. And logs are logs, just adding information. And then Data, well, reading jumps around a lot, and writing sometimes also, filling up gaps that were freed. And while journal and log are written away on other disks, the data disk doesn't lose time on that. Every little bit can help on intensive systems. The next step then is Replication and Sharding to spread the load.

On the https://university.mongodb.com, you can get more information about this if you are interested. Following M202 MongoDB Advanced Deployment and Operations now, what offers some specific information to optimize.