MongoDB sharding

mongodbsharding

I have a questions about mongodb sharding.

I have 7 VM with 1 master mongos and 3 config server en 3 shard servers.

Mongo Main          - 192.168.212.173   poort 27020

Mongo Shard1        - 192.168.212.175   poort 27018
Mongo Shard2        - 192.168.212.176   poort 27018
Mongo Shard3        - 192.168.212.177   poort 27018

Mongo Config1       - 192.168.212.179   poort 27019
Mongo Config2       - 192.168.212.180   poort 27019
Mongo Config3       - 192.168.212.181   poort 27019

I configure all the servers with a shard key on the collection testDB.testData
On the master I log in with mongo 192.168.212.173:27020/admin (master Mongo)
When I look at the sh.status() I see the following:

sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 3,
    "minCompatibleVersion" : 3,
    "currentVersion" : 4,
    "clusterId" : ObjectId("55a4c899d077ed690ce794ff")
}
  shards:
    {  "_id" : "shard0000",  "host" : "192.168.212.175:27018" }
    {  "_id" : "shard0001",  "host" : "192.168.212.176:27018" }
    {  "_id" : "shard0002",  "host" : "192.168.212.177:27018" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "testDB",  "partitioned" : true,  "primary" : "shard0000" }
        testDB.testData
            shard key: { "name" : 1 }
            chunks:
                shard0000   1
            { "name" : { "$minKey" : 1 } } -->> { "name" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0) 
    {  "_id" : "db",  "partitioned" : false,  "primary" : "shard0001" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0001" }

I insert about 10.000 records with the following json, where name is a unique string:

{
    "_id" : ObjectId("55a4e563c27f4a62558b4567"),
    "name" : "EDBfuGq57A"
}

I insert this with the following command:

$collection_sharding->save($mongo_data);

here the mongo_data is the json

If I look at where de data is stored, I see that is is only stored on Shard1 (192.168.212.175)

The sh.getBalancerState is true.

when I look at the monstats of al three shards servers I see that only Shard1 is looking at testDB:

mongostat --host 192.168.212.177:27018,192.168.212.176:27018,192.168.212.175:27018


192.168.212.175:27018       *0     *0     *0     *0       0     2|0       0   288m   835m   125m      0 testDB:0.0%          0       0|0     0|0   120b     2k    11   13:06:23 
192.168.212.176:27018       *0     *0     *0     *0       0     2|0       0    80m   416m    48m      0   test:0.0%          0       0|0     0|0   120b     2k    10   13:06:23 
192.168.212.177:27018       *0     *0     *0     *0       0     2|0       0    80m   398m    44m      0  local:0.0%          0       0|0     0|0   120b     2k    10   13:06:23 

192.168.212.175:27018       *0     *0     *0     *0       0     1|0       0   288m   835m   125m      0 testDB:0.0%          0       0|0     0|0    62b     2k    11   13:06:24 
192.168.212.176:27018       *0     *0     *0     *0       0     1|0       0    80m   416m    48m      0   test:0.0%          0       0|0     0|0    62b     2k    10   13:06:24 
192.168.212.177:27018       *0     *0     *0     *0       0     1|0       0    80m   398m    44m      0  local:0.0%          0       0|0     0|0    62b     2k    10   13:06:24 

After the insert of the 10.000 rows I noticed that the sharding is stopped.

sh.status()
--- Sharding Status --- 
  sharding version: {
    "_id" : 1,
    "version" : 3,
    "minCompatibleVersion" : 3,
    "currentVersion" : 4,
    "clusterId" : ObjectId("55a50555fe726cefcf33b2f1")
}
  shards:
    {  "_id" : "shard0000",  "host" : "192.168.212.175:27018" }
    {  "_id" : "shard0001",  "host" : "192.168.212.176:27018" }
    {  "_id" : "shard0002",  "host" : "192.168.212.177:27018" }
  databases:
    {  "_id" : "admin",  "partitioned" : false,  "primary" : "config" }
    {  "_id" : "test",  "partitioned" : false,  "primary" : "shard0000" }
    {  "_id" : "testDB",  "partitioned" : true,  "primary" : "shard0000" }
    {  "_id" : "db",  "partitioned" : false,  "primary" : "shard0001" }

Could anybody explain to me, why Shard2 and Shard3 filling with data for database testDB and why the sharding is stopped.

Kind Regards,

Arjan Kroon

Best Answer

In order for data to be distributed across multiple shards, you need to have multiple chunks for a given sharded collection. A chunk is a contiguous range of shard key values representing approximately 64 megabytes of data by default.

Chunk ranges are normally created automatically based on observation of the shard key values and size of documents being inserted through mongos. When documents are added to a chunk range which is approaching the maximum chunk size, mongos will trigger a chunk split which will try to create smaller logical chunks for balancing.

Currently all data in your sharded testDB.testdata collection is covered by a single chunk on shard0000 (aka your Shard1 server):

    testDB.testData
        shard key: { "name" : 1 }
        chunks:
            shard0000   1
        { "name" : { "$minKey" : 1 } } -->> { "name" : { "$maxKey" : 1 } } on : shard0000 Timestamp(1, 0) 

Using Object.bsonsize() in the mongo shell to determine document size, your sample document in the question is about 43 bytes in BSON. With only 10,000 documents you would not be close to reaching the 64 megabytes of data required to trigger any chunk splits. Something closer to 1.5 million documents of that size would be required.

You will also need to ensure the values for your chosen shard key have reasonable uniqueness. A poor shard key (i.e. with many common values) can lead to indivisible jumbo chunks that cannot be further split.

If you just want to test how sharding works you could: