MongoDB Indexes – How Are Indexes Stored on Disk?

indexmongodb

So let me first start the question with my understanding of how MongoDb stores data on disk: So when you create a database in mongodb, it allocates a large file named <databasename>.0 and within that file allocates extents that are contiguous regions that correspond to data for a particular collection or particular index.

At such point as this datafile is filled, it creates a new file called <databasename>.1 and populates it in a similar way. Therefore it seems sensible to assume that the most recently inserted data into a particular database will be in the highest numbered file (and my performance tests confirm this).

However, I can't see how this could be true for indices… since we're talking about a bTree, it doesn't seem possible/sensible to have this bTree scattered across files in the same way. As Mongo is doing the maintenance for an index, does the whole index live in one extent until it outgrows it at which point it is relocated to the current (highest numbered datafile)?.

This has become important to me as when starting a database from Amazon EBS snapshot, it seems there's a huge overhead for hitting these datafiles until the volume warms up. I am only interested in a subset of the most recent N documents from a collection. If I could be sure I was only going to need the most recent couple of datafiles I could prewarm these files by reading sequentially before starting mongod.

Best Answer

The delay you are seeing when loading from a snapshot is not cause by how indexes are laid out on disk, it's far more likely that you are seeing the delay because when you start an instance from a snapshot, the data is loaded only on first use, and will be significantly slower than subsequent uses - that is a basic limitation of using snapshots in this way and really has little to do with the application that is trying to access disk. That's why you will see guides on "how to warm up an EBS volume" and the like (there are penalties on writing for the first time too). If you do that (warm up the disk with another application like dd for example) and the performance issue goes away, then you have pretty decent proof that the layout of data has nothing to do with the issue.

Along those lines, MongoDB has the touch command that will allow you to warm up the data before you use it in anger (you can touch data, data and indexes or just indexes). Again, after you initially attach the volume, it will be slow, and touch is going to take a while, but at least after that warming up phase, your results should be somewhat consistent.

In terms of how things are stored on disk, you have the basics correct in terms of file allocation but there is a logical structure within the files, extents, that are the real units of storage. That and far more is covered in detail by this presentation by Mathias Stearn - one of the kernel developers at MongoDB.

Indexes are just another (structured) form of data in MongoDB and they are stored in linked extents throughout the file. Fragmentation can become an issue (that's what the compact command is for) as can disk space used (the repair command is used to reclaim) but you haven't described a workload that would immediately make me think you are hitting a fragmentation issue which is why I suspect something else (like the first use penalty) is your root cause.