Just a slight issue with how you are passing the $minKey
values in, try this instead:
db.adminCommand( { split : "mydb.mycollection" , middle : { "region" : "region1", "foo" : MinKey , "bar" : MinKey } } );
db.adminCommand( { split : "mydb.mycollection" , middle : { "region" : "region2", "foo" : MinKey , "bar" : MinKey } } );
This got me the following layout:
sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("53a2cd9d98b4ace818666544")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:30000" }
{ "_id" : "shard0001", "host" : "localhost:30001" }
{ "_id" : "shard0002", "host" : "localhost:30002" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "mydb", "partitioned" : true, "primary" : "shard0001" }
mydb.mycollection
shard key: { "region" : 1, "foo" : 1, "bar" : 1 }
chunks:
shard0000 1
shard0001 2
{
"region" : { "$minKey" : 1 },
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} -->> {
"region" : "region1",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} on : shard0000 Timestamp(2, 0)
{
"region" : "region1",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} -->> {
"region" : "region2",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} on : shard0001 Timestamp(2, 2)
{
"region" : "region2",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} -->> {
"region" : { "$maxKey" : 1 },
"foo" : { "$maxKey" : 1 },
"bar" : { "$maxKey" : 1 }
} on : shard0001 Timestamp(2, 3)
The use of the $minKey
(MinKey) and $maxKey
(MaxKey) values is a bit tough to tease out (they are rarely used except internally), but there is a decent, and illustrative example here in the docs.
You should be able to use the index listed to cover the shard key. It is a super set of your shard key fields.
The shard key listed should be fine for distributing write load, given you don't expect any individual articleId/host pairs to take the bulk of your writes at a given point in time.
I would be concerned about this shard key for reads. In order to target a single shard for a query, you need to include the shard key values. My guess is your queries do not include timestamp. Without timestamp your queries will be sent to every shard which is inefficient. With scatter gather reads, your hamper your ability to scale reads by adding shards.
Best Answer
You'll need to rethink your shard key approach.
As at MongoDB 3.2:
All fields in a compound shard key must be present in all documents and will be immutable (i.e. the shard key for an existing document cannot be changed).
A hashed shard key is based on a single field, and does not support range queries.
It generally makes sense to have a shard key that supports your common queries so they can be targeted at a subset of shards with relevant data, but this doesn't appear to be possible in your case as both
foo
andbar
are optional fields.If your
_id
field provides good cardinality (i.e. large number of values) but is monotonically increasing (eg. default ObjectIDs) you could consider a hashed shard index on the_id
field for good write distribution. The hashed index wouldn't support your common read queries (unless by specific_id
values) so you would need a secondary index for your queries onfoo
andbar
(i.e. {foo:1, bar:1
}). The recommended secondary index(es) and order will depend on your common queries and sort order.For further background information I suggest reviewing: