Mongodb – Hash index use case in MongoDB

indexmongodb

Recently MongoDB added hash index to its features but in the documentation doesn't explain much about hash index use case and its privileges to normal MongoDB index. So I'll be grateful if you guide me to know more about it.

Best Answer

Hashed indexes were added in MongoDB 2.4 (March 2013) specifically to support hashed shard keys. I'm not aware of any use case outside of sharding.

When choosing a shard key you can generally get the best outcome (appropriate balance of read/write locality) by defining your own compound index. However, an effective shard key requires an understanding of the attributes of your chosen field(s) (eg. cardinality, divisibility, randomness) as well as your application use case (common update/read queries). The field(s) included in your shard key index also need to be present in every document and cannot be changed after insertion.

Where there is no natural choice of shard key based on your data, a hashed shard key can be used to achieve a more uniform distribution of values which will help distribute writes across multiple shards. The field being hashed still needs to have good cardinality (i.e. large number of different values), so ObjectId values or timestamps work well. The downside of a hashed shard key is that it supports equality queries, but cannot be used for range queries since the values in the index are effectively distributed.

For example, the default _id (ObjectId) includes a leading timestamp component. While the _id field is unique, immutable, and present in every document .. an ObjectId is not a suitable choice for a shard key because the values are monotonically increasing. If you shard on an ever-increasing value like an ObjectId, all of the new inserts will target a single "hot" shard that currently has the highest shard key value. Documents will then have to be re-balanced to your other shards, which renders your sharding ineffective for scaling -- a poorly sharded collection will be limited by the write throughput of a single shard plus the overhead of frequent re-balancing of newly inserted data between shards.

For more information, see: Shard a Collection Using a Hashed Shard Key.