MongoDB Sharding – Using Compound Shard Key with _id Field

mongodbmongodb-3.2sharding

I have documents like:

{_id: "someid1", "bar": "somevaluebar1"}
{_id: "someid2", "foo": "somevaluefoo2", "bar": "somevaluebar2"}
{_id: "someid3", "foo": "somevaluefoo3", "zoo": "somevaluezoo3"}
{_id: "someid4", "zoo": "somevaluezoo4"}

If we query documents by "foo" the most and "bar" the second, does it make sense to create a compound shard key like { "foo" : 1, "bar" : 1, "_id" : 1 } because "foo" and "bar" might be missing too?

When I tried to run this command

sh.shardCollection("<your-db>", {{ "foo" : 1, "bar" : 1, "_id" : 1 }:"hashed"})

it gave me a syntax error.

Best Answer

You'll need to rethink your shard key approach.

As at MongoDB 3.2:

  • All fields in a compound shard key must be present in all documents and will be immutable (i.e. the shard key for an existing document cannot be changed).

  • A hashed shard key is based on a single field, and does not support range queries.

It generally makes sense to have a shard key that supports your common queries so they can be targeted at a subset of shards with relevant data, but this doesn't appear to be possible in your case as both foo and bar are optional fields.

If your _id field provides good cardinality (i.e. large number of values) but is monotonically increasing (eg. default ObjectIDs) you could consider a hashed shard index on the _id field for good write distribution. The hashed index wouldn't support your common read queries (unless by specific _id values) so you would need a secondary index for your queries on foo and bar (i.e. {foo:1, bar:1}). The recommended secondary index(es) and order will depend on your common queries and sort order.

For further background information I suggest reviewing: