MongoDB presplitting chunks for compound shard key

mongodbsharding

In my mongodb setup I have a compound shard key {"region" : 1, "foo" : 1, "bar" : 1} and I know the values region can be and that each region should be on one chunk.

Therefore I'd like to pre-split based on the region key only. The sharding status should look like that afterwards:

{ "region": { "$minKey" : 1 }, "foo": { "$minKey" : 1 }, "bar": { "$minKey" : 1 } } -> { "region": region1, "foo": { "$minKey" : 1 }, "bar": { "$minKey" : 1 } } on: shard1
{ "region": region1, "foo": { "$minKey" : 1 }, "bar": { "$minKey" : 1 } } -> { "region": region2, "foo": { "$minKey" : 1 }, "bar": { "$minKey" : 1 } } on: shard2
{ "region": region2, "foo": { "$minKey" : 1 }, "bar": { "$minKey" : 1 } } -> { "region": { "$maxKey" : 1 }, "foo": { "$minKey" : 1 }, "bar": { "$minKey" : 1 } } on: shard3

I've tried some ways to achieve that, but nothing worked:

  • db.runCommand( { split : "mydb.mycollection" , middle : { "region" : "region1" } } ); will return an error, because the complete shard key have to be part of the split.
  • db.runCommand( { split : "mydb.mycollection" , middle : { "region" : "region1", "foo" : { "$minKey" : 1 }, "bar" : { "$minKey" : 1 } } } ); and db.runCommand( { split : "mydb.mycollection" , middle : { "region" : "region1", "foo" : "$minKey" , "bar" : "$minKey" } } ); will interpret the minKey as String and split on that base which is wrong.

How can I finally split chunks with a compound shard key on a single field base?!

Cheers.

Best Answer

Just a slight issue with how you are passing the $minKey values in, try this instead:

db.adminCommand( { split : "mydb.mycollection" , middle : { "region" : "region1", "foo" : MinKey , "bar" : MinKey } } );
db.adminCommand( { split : "mydb.mycollection" , middle : { "region" : "region2", "foo" : MinKey , "bar" : MinKey } } );

This got me the following layout:

sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("53a2cd9d98b4ace818666544")
}
  shards:
    {  "_id" : "shard0000",  "host" : "localhost:30000" }
    {  "_id" : "shard0001",  "host" : "localhost:30001" }
    {  "_id" : "shard0002",  "host" : "localhost:30002" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "mydb",  "partitioned" : true,  "primary" : "shard0001" }
        mydb.mycollection
            shard key: { "region" : 1, "foo" : 1, "bar" : 1 }
            chunks:
                shard0000   1
                shard0001   2
            {
    "region" : { "$minKey" : 1 },
    "foo" : { "$minKey" : 1 },
    "bar" : { "$minKey" : 1 }
} -->> {
    "region" : "region1",
    "foo" : { "$minKey" : 1 },
    "bar" : { "$minKey" : 1 }
} on : shard0000 Timestamp(2, 0) 
            {
    "region" : "region1",
    "foo" : { "$minKey" : 1 },
    "bar" : { "$minKey" : 1 }
} -->> {
    "region" : "region2",
    "foo" : { "$minKey" : 1 },
    "bar" : { "$minKey" : 1 }
} on : shard0001 Timestamp(2, 2) 
            {
    "region" : "region2",
    "foo" : { "$minKey" : 1 },
    "bar" : { "$minKey" : 1 }
} -->> {
    "region" : { "$maxKey" : 1 },
    "foo" : { "$maxKey" : 1 },
    "bar" : { "$maxKey" : 1 }
} on : shard0001 Timestamp(2, 3) 

The use of the $minKey (MinKey) and $maxKey (MaxKey) values is a bit tough to tease out (they are rarely used except internally), but there is a decent, and illustrative example here in the docs.