Mongodb – Chunks not splitting on MongoDB sharded clusters

linuxmongodbsharding

I have created 2 shard servers named rs1 and rs2.
Now in the database collection there are multiple records those should split on these 2 servers in chunks. The sharding key is TxnMonth. It is not unique.

Now I have given the chunk size as 1 MB. When I inserted records corresponding to "TxnMonth" : 1, "TxnMonth" : 2 and "TxnMonth" : 3, the data was splitted onto 2 shards where on rs1 there were 2 chunks and on rs2 there was 1 chunk.

On checking, data for TxnMonth : 1 and 2 was on rs1 and data for TxnMonth : 3 was on rs2.

On running, sh.status(), it gave this output-

{  "_id" : "shardPOC",  "primary" : "rs1",  "partitioned" : true }
                shardPOC.machines
                        shard key: { "TxnMonth" : 1 }
                        unique: false
                        balancing: true
                        chunks:
                                rs1     2
                                rs2     1
                        { "TxnMonth" : { "$minKey" : 1 } } -->> { "TxnMonth" : 1 } on : rs1 Timestamp(2, 1)
                        { "TxnMonth" : 1 } -->> { "TxnMonth" : 3 } on : rs1 Timestamp(1, 2)
                        { "TxnMonth" : 3 } -->> { "TxnMonth" : { "$maxKey" : 1 } } on : rs2 Timestamp(2, 0)
        {  "_id" : "NewTestDB",  "primary" : "rs1",  "partitioned" : false }

After this, I started inserting records for TxnMonth : 4, 5, 6, 7 and 8, to check when the chunk on rs2 will split but even if the chunk size exceeded 1 MB size, it didn't split. Below is the output of db.collection.getShardDistribution()

Shard rs1 at rs1/JCB-DB1:27012
   data : 57KiB docs : 80 chunks : 2
   estimated data per chunk : 28KiB
   estimated docs per chunk : 40
Shard rs2 at rs2/JCB-DB1:27013
   data : 1.1MiB docs : 1560 chunks : 1
   estimated data per chunk : 1.1MiB
   estimated docs per chunk : 1560
Totals
   data : 1.15MiB docs : 1640 chunks : 3
   Shard rs1 contains 4.87% data, 4.87% docs in cluster, avg obj size on shard : 740B
   Shard rs2 contains 95.12% data, 95.12% docs in cluster, avg obj size on shard : 740B

What I expected was that the chunk will split after it exceeds 1 MB but it did not.
Can someone explain the reason?
Thanks in advance.

Best Answer

Your chosen shard key for a collection defines the granularity of possible chunk splits. Chunks represent a range of values for the shard key. With a single field in your shard key (TxnMonth), that means the smallest possible range is a single value such as TxnMonth: 3.

The chunk size setting determines when MongoDB will try to split chunk ranges in order to help distribute data in a sharded cluster, but does not put an upper bound on the size of the data in a chunk range.

When the maximum chunk size is reached for a given value, that chunk will be marked as an indivisible jumbo chunk and the balancer will no longer attempt to split or migrate that chunk. New data will continue to be added to that chunk leading to an imbalance in data distribution as you observed in your testing.

there are 8 distinct values of TxnMonth from 1 to 8

This limits your sharded collection to a maximum of 8 chunks (or the number of unique TxnMonth values in this collection).

For effective sharding you need to choose a more appropriate shard key with higher cardinality.