Math first: you moved 150M documents in 2 days, which is roughly 860 documents per second including metadata and indices, where reading and writing all occurs on the same machine. That is not what I would call slow. The description coming to my mind is "lightning fast". ;)
Since there is no real distribution of the write load, an easy way to speed things up is to add two or more machines.
A few notes: sharding production data on non-replica shards is dangerous, to say the least. If one of the shards fails, the data contained is permanently unavailable until you get the shard up and running again. Plus, since there was no server to write to for a specific key range, values of that range can not be written. If the shard was a replica set, the failure would lead to the election of a new primary, to which all writes and (depending on the configuration) most or even all reads would go.
_id
can be used as a shard key, however it should be hashed.
New to MongoDB, but this is how I learned how to do the same thing following the MongoDB 202 class. There are other ways to balance traffic, such as moving, splitting and merging chunks. I haven't read anywhere that directly updating the tags collection in the config database was dirty or incorrect.
The code below is untested, and you will have to replace certain values like nameofShard and the minimum, maximum values. You will notice that I stop the balancer before modifying the tag ranges
// stop the balancer
sh.stopBalancer()
// remove the tags from the shards
sh.removeShardTag("nameofshardN", "rangeTime1")
sh.removeShardTag("nameofshardN+1", "rangeTime2")
// remove the existing tags
db.tags.remove({ "_id" : { "ns" : "testlab.range", "min" : { "_id" : 1 } }, "ns" : "testlab.range", "min" : { "_id" : 1 }, "max" : { "_id" : 100 }, "tag" : "rangeTime1" })
db.tags.remove({ "_id" : { "ns" : "testlab.range", "min" : { "_id" : 100 } }, "ns" : "testlab.range", "min" : { "_id" : 100 }, "max" : { "_id" : 200 }, "tag" : "rangeTime2" })
// Add new tags
sh.addTagRange("testlab.range", {_id : new_minimum_value}, {_id : new_maximum_value}, "rangeTime1");
sh.addTagRange("testlab.range", {_id : new_minimum_value}, {_id : new_maximum_value}, "rangeTime2");
// Add shardTag
sh.addShardTag("nameofshardN", "rangeTime1")
sh.addShardTag("nameofshardN+1", "RangeTime2")
// start balancer
sh.startBalancer()
sh.status() or mongostat --port 27017 --discover
to verify balancing
test.lab is not a valid database name, so I changed my database name to testlab.
One additional remark about the _id field not being updated. When I query the tags collection like so
mongos> db.tags.find({}, {_id:1})
{ "_id" : { "ns" : "testlab.range", "min" : { "_id" : 1 } } }
{ "_id" : { "ns" : "testlab.range", "min" : { "_id" : 100 } } }
It shows that the _id field contains the min value or the tag range. If you are updating the tags collection and modifying the tags min field value, then wouldn't it also be reasonable to update the _id field's min value as well?
Best Answer
From what I can gather after some experiments, it seems that the prefix range is the one that must be split.