Mongodb – Is the chunk size a limit to the max data size that can be associated with a shard key value

clusteringmongodbsharding

I haven't read about the relation between chunk size and the data size associated to a single shard key value in a collection, so, another way to ask the question could be, can a shard key span more than one chunk?

Best Answer

can a shard key span more than one chunk?

No. A chunk in MongoDB is a logical contiguous range of one or more shard key values. A unique shard key value cannot exist in more than a single chunk range, but many documents may have the same shard key. The cardinality for your chosen shard key determines the granularity of documents that can be represented in a chunk range (with the lowest granularity being a chunk that represents a single shard key value).

The relationship between shard key and chunks is illustrated in the following diagram from the MongoDB documentation explaining Choosing a Shard Key:

Shard Key vs Chunk Ranges

Is the chunk size a limit to the max data size that can be associated with a shard key value

No. The chunk size determines the approximate size of documents expected to be represented by a chunk range (by default, 64MB). If a chunk range is observed to be approaching (or possibly exceeding) the configured chunk size, MongoDB will attempt to split that single chunk into multiple chunks representing smaller contiguous ranges of the shard key. Chunk splits update the sharded cluster metadata but do not involve copying documents between shards. MongoDB's sharded cluster balancer uses migration thresholds to determine when it is appropriate to migrate chunks and associated documents between available shards.

If MongoDB cannot split a chunk that exceeds the configured chunk size (for example, due to a low cardinality shard key), that chunk will be marked as jumbo. Jumbo chunks are problematic as they will lead to an imbalance of data distribution within your sharded cluster: the balancer will no longer attempt to split or migrate jumbo chunks.

The default chunk size works well for most deployments, however it is possible to modify the chunk size if there is a more appropriate setting for your deployment or use case. A smaller chunk size will lead to more frequent chunk split and migration activity; a larger chunk size will lead to more I/O for individual chunk operations such as migrations. The documentation for modifying the chunk size notes several considerations which are worth reviewing before changing this setting.