MongoDB – sharding

mongodbsharding

I have a sharded cluster with 4 shards, 3 config databases. My problem is when I do my inserts only one shard gets filled. Can anyone please explain to me why this is happening. My first thought was that it has something to do with the shard-key but i tried a few and nothing changed.

mongos> db.flight.stats()
{
    "sharded" : true,
    "systemFlags" : 1,
    "userFlags" : 1,
    "ns" : "flights.flight",
    "count" : 2874004,
    "numExtents" : 17,
    "size" : 689760960,
    "storageSize" : 857440256,
    "totalIndexSize" : 211537648,
    "indexSizes" : {
            "_id_" : 93255456,
            "departureAirport_1" : 118282192
    },
    "avgObjSize" : 240,
    "nindexes" : 2,
    "nchunks" : 20,
    "shards" : {
        "shard0002" : {
                    "ns" : "flights.flight",
                    "count" : 2874004,
                    "size" : 689760960,
                    "avgObjSize" : 240,
                    "storageSize" : 857440256,
                    "numExtents" : 17,
                    "nindexes" : 2,
                    "lastExtentSize" : 227803136,
                    "paddingFactor" : 1,
                    "systemFlags" : 1,
                    "userFlags" : 1,
                    "totalIndexSize" : 211537648,
                    "indexSizes" : {
                            "_id_" : 93255456,
                            "departureAirport_1" : 118282192
                    },
                    "ok" : 1
            }
    },
    "ok" : 1
}

Sharding status information:

mongos> sh.status()

sharding version: {
    "_id" : 1,
    "version" : 4,
    "minCompatibleVersion" : 4,
    "currentVersion" : 5,
    "clusterId" : ObjectId("538c20fdbf2687fbbc582642")
}
shards:
    {  "_id" : "shard0000",  "host" : "cs-mongo-shard-1-east-t:10001" }
    {  "_id" : "shard0001",  "host" : "cs-mongo-shard-2-east-t:10001" }
    {  "_id" : "shard0002",  "host" : "10.1.10.15:10001" }
databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "flights",  "partitioned" : true,  "primary" : "shard0002" }
        flights.flight
            shard key: {
    "departureAirport" : 1,
    "arrivalAirport" : 1,
    "departureDate" : 1,
    "availability" : 1
}
        chunks:
            shard0002    1
    {
    "departureAirport" : { "$minKey" : 1 },
    "arrivalAirport" : { "$minKey" : 1 },
    "departureDate" : { "$minKey" : 1 },
    "availability" : { "$minKey" : 1 }
} -->> {
    "departureAirport" : { "$maxKey" : 1 },
    "arrivalAirport" : { "$maxKey" : 1 },
    "departureDate" : { "$maxKey" : 1 },
    "availability" : { "$maxKey" : 1 }
} on : shard0002 Timestamp(1, 0) 

Best Answer

OK, flights.flight is sharded but basically only has one chunk so far, and hence all current data is on one shard and so all writes can only go to one shard - you will need more before the collection can be balanced across multiple shards. Unless you pre-split (something like this) or use a hashed shard key (which you are not) then the mongos process is what is responsible for automatically splitting the chunks.

It will attempt the first split after it has seen writes of approximately 20% of the max chunk size (default is 64MiB, so 20% is ~13MiB). It looks like this has not yet happened, or has failed for some reason in your case.

It should be noted that although the mongos will track all the writes, it will lose its "history" if restarted. It should also be noted that the config servers must be up and working for any splits to happen. Check the mongos logs for any evidence of split failures or similar.

If you want to kick start things manually, just pick two (or more) reasonable split points and use the splitAt() command to break up your current single chunk. You can then wait for the balancer to move things, or you can move them manually with the moveChunk() command. You can find examples of this in the answer to the question I linked above.