Math first: you moved 150M documents in 2 days, which is roughly 860 documents per second including metadata and indices, where reading and writing all occurs on the same machine. That is not what I would call slow. The description coming to my mind is "lightning fast". ;)
Since there is no real distribution of the write load, an easy way to speed things up is to add two or more machines.
A few notes: sharding production data on non-replica shards is dangerous, to say the least. If one of the shards fails, the data contained is permanently unavailable until you get the shard up and running again. Plus, since there was no server to write to for a specific key range, values of that range can not be written. If the shard was a replica set, the failure would lead to the election of a new primary, to which all writes and (depending on the configuration) most or even all reads would go.
_id
can be used as a shard key, however it should be hashed.
Just a slight issue with how you are passing the $minKey
values in, try this instead:
db.adminCommand( { split : "mydb.mycollection" , middle : { "region" : "region1", "foo" : MinKey , "bar" : MinKey } } );
db.adminCommand( { split : "mydb.mycollection" , middle : { "region" : "region2", "foo" : MinKey , "bar" : MinKey } } );
This got me the following layout:
sh.status()
--- Sharding Status ---
sharding version: {
"_id" : 1,
"version" : 4,
"minCompatibleVersion" : 4,
"currentVersion" : 5,
"clusterId" : ObjectId("53a2cd9d98b4ace818666544")
}
shards:
{ "_id" : "shard0000", "host" : "localhost:30000" }
{ "_id" : "shard0001", "host" : "localhost:30001" }
{ "_id" : "shard0002", "host" : "localhost:30002" }
databases:
{ "_id" : "admin", "partitioned" : false, "primary" : "config" }
{ "_id" : "mydb", "partitioned" : true, "primary" : "shard0001" }
mydb.mycollection
shard key: { "region" : 1, "foo" : 1, "bar" : 1 }
chunks:
shard0000 1
shard0001 2
{
"region" : { "$minKey" : 1 },
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} -->> {
"region" : "region1",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} on : shard0000 Timestamp(2, 0)
{
"region" : "region1",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} -->> {
"region" : "region2",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} on : shard0001 Timestamp(2, 2)
{
"region" : "region2",
"foo" : { "$minKey" : 1 },
"bar" : { "$minKey" : 1 }
} -->> {
"region" : { "$maxKey" : 1 },
"foo" : { "$maxKey" : 1 },
"bar" : { "$maxKey" : 1 }
} on : shard0001 Timestamp(2, 3)
The use of the $minKey
(MinKey) and $maxKey
(MaxKey) values is a bit tough to tease out (they are rarely used except internally), but there is a decent, and illustrative example here in the docs.
Best Answer
I guess you mean primary shard is down, because the term primary database is not applicable. The behavior will be the same as losing any other shard. The difference is that both sharded and unsharded collections will get affected.