Recently MongoDB added hash index to its features but in the documentation doesn't explain much about hash index use case and its privileges to normal MongoDB index. So I'll be grateful if you guide me to know more about it.
Mongodb – Hash index use case in MongoDB
indexmongodb
Related Question
- Mongodb – What “fromRouter” option means on MongoDB aggregations
- MongoDB – Using Too Much Memory
- MongoDB Index Intersection vs. Compound Indexes
- Mongodb index keeps growing in size eventhough the collection has a TTL index
- Postgresql – How to import the _id index from a MongoDB collection as a hash index for a PostgreSQL table (from a CSV export from MongoDB)
Best Answer
Hashed indexes were added in MongoDB 2.4 (March 2013) specifically to support hashed shard keys. I'm not aware of any use case outside of sharding.
When choosing a shard key you can generally get the best outcome (appropriate balance of read/write locality) by defining your own compound index. However, an effective shard key requires an understanding of the attributes of your chosen field(s) (eg. cardinality, divisibility, randomness) as well as your application use case (common update/read queries). The field(s) included in your shard key index also need to be present in every document and cannot be changed after insertion.
Where there is no natural choice of shard key based on your data, a hashed shard key can be used to achieve a more uniform distribution of values which will help distribute writes across multiple shards. The field being hashed still needs to have good cardinality (i.e. large number of different values), so
ObjectId
values or timestamps work well. The downside of a hashed shard key is that it supports equality queries, but cannot be used for range queries since the values in the index are effectively distributed.For example, the default
_id
(ObjectId) includes a leading timestamp component. While the_id
field is unique, immutable, and present in every document .. anObjectId
is not a suitable choice for a shard key because the values are monotonically increasing. If you shard on an ever-increasing value like anObjectId
, all of the new inserts will target a single "hot" shard that currently has the highest shard key value. Documents will then have to be re-balanced to your other shards, which renders your sharding ineffective for scaling -- a poorly sharded collection will be limited by the write throughput of a single shard plus the overhead of frequent re-balancing of newly inserted data between shards.For more information, see: Shard a Collection Using a Hashed Shard Key.