Mongodb – Hash index use case in MongoDB

indexmongodb

Recently MongoDB added hash index to its features but in the documentation doesn't explain much about hash index use case and its privileges to normal MongoDB index. So I'll be grateful if you guide me to know more about it.

Best Answer

Hashed indexes were added in MongoDB 2.4 (March 2013) specifically to support hashed shard keys. I'm not aware of any use case outside of sharding.

When choosing a shard key you can generally get the best outcome (appropriate balance of read/write locality) by defining your own compound index. However, an effective shard key requires an understanding of the attributes of your chosen field(s) (eg. cardinality, divisibility, randomness) as well as your application use case (common update/read queries). The field(s) included in your shard key index also need to be present in every document and cannot be changed after insertion.

Where there is no natural choice of shard key based on your data, a hashed shard key can be used to achieve a more uniform distribution of values which will help distribute writes across multiple shards. The field being hashed still needs to have good cardinality (i.e. large number of different values), so ObjectId values or timestamps work well. The downside of a hashed shard key is that it supports equality queries, but cannot be used for range queries since the values in the index are effectively distributed.

For example, the default _id (ObjectId) includes a leading timestamp component. While the _id field is unique, immutable, and present in every document .. an ObjectId is not a suitable choice for a shard key because the values are monotonically increasing. If you shard on an ever-increasing value like an ObjectId, all of the new inserts will target a single "hot" shard that currently has the highest shard key value. Documents will then have to be re-balanced to your other shards, which renders your sharding ineffective for scaling -- a poorly sharded collection will be limited by the write throughput of a single shard plus the overhead of frequent re-balancing of newly inserted data between shards.

For more information, see: Shard a Collection Using a Hashed Shard Key.

Related Solutions

Mongodb – How to index dynamic attributes in MongoDB

I also asked this same question (in a bit expanded form) on the mongodb-user mailing list, where I got an answer. Read from there to get more details. The short answer is, that the strategy used in the question should work fine, but there's an issue that makes it very inefficient. Hopefully, the issue will be fixed soon.

For my case, I only need to query for exact matches for the tuple {n,v}, so I can create a multikey index:

db.mycollection.ensureIndex({"attrs":1})

and make they query like this:

db.mycollection.find({"attrs": {n: "subject", v: "Some subject"}})

which works great and uses the index very effectively.

MongoDB index strategy

Logging is always a good use case for non relational stores (text book example).

You've only given us one query pattern, so going by that I would create a single compound index, equality to inequality.

I would suggest reading up on indexes at:

MongoDB Index Introduction

and

MongoDB FAQ: Indexes

Best Answer

Related Solutions

Mongodb – How to index dynamic attributes in MongoDB

MongoDB index strategy

Related Question