MongoDB index strategy

indexmongodbsharding

I'm developing a system to back-up computers and store the files / data with retention for a while. The actual files are stored in an object store, while I'm planning to keep the metadata of the files in MongoDB. I am however rather new to MongoDB.

As there will be many computers containing many many files, there will eventually be billions of (small) records / documents. I'd like to use MongoDB for its sharding features.

For each item I'm storing the following data items: Computer_ID, objectstore_ID, path, filename, size, last_modified, revision_start, revision_end. An example document in the collection would be:

{"computer_ID": 6829, "objectstore_ID": "abcdefghijklmnopqrstuvwxyz", "path": "/home/user/path/to/directory/", "name": "file.txt", "size": 58202, "last_modified": 1395491119, "revision_start": 52, "revision_end": 58 }

Typical queries would be listing a directory on a computer in a specific revision, for example {"computer_ID": 312, "path": "/home/", "revision_start": >=54, "revision_end": <=54 }. Usually those four columns will be used in queries.

My first question is if MongoDB is the right solution for this use case, or would a relational database suit my needs better? In case of the latter, how could I distribute the data over multiple servers like MongoDB has sharding?

Secondly, what would be the best indexing strategy to maintain the best (or at least reasonable) performance when the database grows. Should I add 4 separate indexes on computer_ID, path, revision_start and revision_end? Or create one multikey containing all four? Maybe something else?

Finally, I also wonder whether the storage size of the indexes won't grow to a multiple of the original data size.

Best Answer

Logging is always a good use case for non relational stores (text book example).

You've only given us one query pattern, so going by that I would create a single compound index, equality to inequality.

I would suggest reading up on indexes at:

MongoDB Index Introduction

and

MongoDB FAQ: Indexes