I have created 2 shard servers named rs1 and rs2.
Now in the database collection there are multiple records those should split on these 2 servers in chunks. The sharding key is TxnMonth
. It is not unique.
Now I have given the chunk size as 1 MB. When I inserted records corresponding to "TxnMonth" : 1
, "TxnMonth" : 2
and "TxnMonth" : 3
, the data was splitted onto 2 shards where on rs1 there were 2 chunks and on rs2 there was 1 chunk.
On checking, data for TxnMonth
: 1 and 2 was on rs1 and data for TxnMonth
: 3 was on rs2.
On running, sh.status()
, it gave this output-
{ "_id" : "shardPOC", "primary" : "rs1", "partitioned" : true }
shardPOC.machines
shard key: { "TxnMonth" : 1 }
unique: false
balancing: true
chunks:
rs1 2
rs2 1
{ "TxnMonth" : { "$minKey" : 1 } } -->> { "TxnMonth" : 1 } on : rs1 Timestamp(2, 1)
{ "TxnMonth" : 1 } -->> { "TxnMonth" : 3 } on : rs1 Timestamp(1, 2)
{ "TxnMonth" : 3 } -->> { "TxnMonth" : { "$maxKey" : 1 } } on : rs2 Timestamp(2, 0)
{ "_id" : "NewTestDB", "primary" : "rs1", "partitioned" : false }
After this, I started inserting records for TxnMonth
: 4, 5, 6, 7 and 8, to check when the chunk on rs2 will split but even if the chunk size exceeded 1 MB size, it didn't split. Below is the output of db.collection.getShardDistribution()
–
Shard rs1 at rs1/JCB-DB1:27012
data : 57KiB docs : 80 chunks : 2
estimated data per chunk : 28KiB
estimated docs per chunk : 40
Shard rs2 at rs2/JCB-DB1:27013
data : 1.1MiB docs : 1560 chunks : 1
estimated data per chunk : 1.1MiB
estimated docs per chunk : 1560
Totals
data : 1.15MiB docs : 1640 chunks : 3
Shard rs1 contains 4.87% data, 4.87% docs in cluster, avg obj size on shard : 740B
Shard rs2 contains 95.12% data, 95.12% docs in cluster, avg obj size on shard : 740B
What I expected was that the chunk will split after it exceeds 1 MB but it did not.
Can someone explain the reason?
Thanks in advance.
Best Answer
Your chosen shard key for a collection defines the granularity of possible chunk splits. Chunks represent a range of values for the shard key. With a single field in your shard key (
TxnMonth
), that means the smallest possible range is a single value such asTxnMonth: 3
.The chunk size setting determines when MongoDB will try to split chunk ranges in order to help distribute data in a sharded cluster, but does not put an upper bound on the size of the data in a chunk range.
When the maximum chunk size is reached for a given value, that chunk will be marked as an indivisible
jumbo
chunk and the balancer will no longer attempt to split or migrate that chunk. New data will continue to be added to that chunk leading to an imbalance in data distribution as you observed in your testing.This limits your sharded collection to a maximum of 8 chunks (or the number of unique
TxnMonth
values in this collection).For effective sharding you need to choose a more appropriate shard key with higher cardinality.