Just a slight issue with how you are passing the $minKey
values in, try this instead:
db.adminCommand( { split : "mydb.mycollection" , middle : { "region" : "region1", "foo" : MinKey , "bar" : MinKey } } );
db.adminCommand( { split : "mydb.mycollection" , middle : { "region" : "region2", "foo" : MinKey , "bar" : MinKey } } );
This got me the following layout:
sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("53a2cd9d98b4ace818666544")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:30000" }
{ "_id" : "shard0001", "host" : "localhost:30001" }
{ "_id" : "shard0002", "host" : "localhost:30002" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "mydb", "partitioned" : true, "primary" : "shard0001" }
mydb.mycollection
shard key: { "region" : 1, "foo" : 1, "bar" : 1 }
chunks:
shard0000 1
shard0001 2
{
"region" : { "$minKey" : 1 },
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} -->> {
"region" : "region1",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} on : shard0000 Timestamp(2, 0)
{
"region" : "region1",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} -->> {
"region" : "region2",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} on : shard0001 Timestamp(2, 2)
{
"region" : "region2",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} -->> {
"region" : { "$maxKey" : 1 },
"foo" : { "$maxKey" : 1 },
"bar" : { "$maxKey" : 1 }
} on : shard0001 Timestamp(2, 3)
The use of the $minKey
(MinKey) and $maxKey
(MaxKey) values is a bit tough to tease out (they are rarely used except internally), but there is a decent, and illustrative example here in the docs.
New to MongoDB, but this is how I learned how to do the same thing following the MongoDB 202 class. There are other ways to balance traffic, such as moving, splitting and merging chunks. I haven't read anywhere that directly updating the tags collection in the config database was dirty or incorrect.
The code below is untested, and you will have to replace certain values like nameofShard and the minimum, maximum values. You will notice that I stop the balancer before modifying the tag ranges
// stop the balancer
sh.stopBalancer()
// remove the tags from the shards
sh.removeShardTag("nameofshardN", "rangeTime1")
sh.removeShardTag("nameofshardN+1", "rangeTime2")
// remove the existing tags
db.tags.remove({ "_id" : { "ns" : "testlab.range", "min" : { "_id" : 1 } }, "ns" : "testlab.range", "min" : { "_id" : 1 }, "max" : { "_id" : 100 }, "tag" : "rangeTime1" })
db.tags.remove({ "_id" : { "ns" : "testlab.range", "min" : { "_id" : 100 } }, "ns" : "testlab.range", "min" : { "_id" : 100 }, "max" : { "_id" : 200 }, "tag" : "rangeTime2" })
// Add new tags
sh.addTagRange("testlab.range", {_id : new_minimum_value}, {_id : new_maximum_value}, "rangeTime1");
sh.addTagRange("testlab.range", {_id : new_minimum_value}, {_id : new_maximum_value}, "rangeTime2");
// Add shardTag
sh.addShardTag("nameofshardN", "rangeTime1")
sh.addShardTag("nameofshardN+1", "RangeTime2")
// start balancer
sh.startBalancer()
sh.status() or mongostat --port 27017 --discover
to verify balancing
test.lab is not a valid database name, so I changed my database name to testlab.
One additional remark about the _id field not being updated. When I query the tags collection like so
mongos> db.tags.find({}, {_id:1})
{ "_id" : { "ns" : "testlab.range", "min" : { "_id" : 1 } } }
{ "_id" : { "ns" : "testlab.range", "min" : { "_id" : 100 } } }
It shows that the _id field contains the min value or the tag range. If you are updating the tags collection and modifying the tags min field value, then wouldn't it also be reasonable to update the _id field's min value as well?
Best Answer
Your understanding is correct: a chunk can only be associated with a single shard. A chunk range representing a single shard key value will be an indivisible chunk, so assigning this range to multiple shards will not result in data distribution across multiple shards.
If the data in this chunk eventually exceeds the chunk size limit (64 megabytes by default), the chunk will be flagged as a jumbo chunk and will be ignored for future migration activity.
Shard keys with limited cardinality (or outliers with high frequency) should generally be avoided as they will lead to data imbalance and performance challenges.