So, after loop finishes, the results of db.collection.getShardDistribution() shows that all chunks are located on only one of shards, after few minutes in fact chunks are distributed evenly between shards. Here is my question: should not mongos distribute chunks between shards during execution of loop, instead of directing them all to one shard?
The issue is that you are using a monotonically increasing shard key. This will result in all inserts targeting the single shard (aka a "hot shard") that currently has the chunk range representing the highest shard key value. Data will eventually be rebalanced to other shards, but this is not an effective use of sharding for write scaling. You need to choose a more appropriate shard key. If your use case does not require range queries, you could also consider using a hashed shard key on the address
field. See Hashed vs Ranged Sharding for an illustration of the expected outcomes.
Another related case: I try to import large MongoDB database in bson using mongorestore. I'm doing it from outside of Docker network: mongorestore --host 127.19.0.150:3300 -d import1 -c test /path/base.bson The import works well, but all the chunks are located in one of shards.
If the outcome is similar (all inserts going to a single shard and then being rebalanced), this also suggests a poor shard key choice.
If you are bulk inserting into an empty sharded collection there is an approach you can use to minimize rebalancing: pre-splitting chunk ranges based on the known distribution of shard key values in your existing data.
The database data schema consists of multiple fields, I have chosen one field with Int32 datatype as shard key, but it's cardinality is very low, 15% of documents have the same value for it, could this be a source of it?
Low cardinality shard keys will definitely cause issues with data distribution. The most granular chunk range possible will represent a single shard key value. If a large percentage of your documents share the same shard key values, those will eventually lead to indivisible jumbo chunks
which will be ignored by the balancer.
All containers run on computer with 32GB of RAM and i7-6700HQ, could the slow HDD be a bottleneck resulting in such a slow chunks migration?
There isn't enough information to determine if your disk is the most limiting factor, but running a sharded cluster on a single computer with a slow HDD will certainly add resource contention challenges. Choosing appropriate shard keys should minimize the need for data migration unless you are adding or removing shards for your deployment.
Assuming you are using a recent version of MongoDB with the WiredTiger storage engine as default (MongoDB 3.2+), you will definitely want to explicitly set --wiredTigerCacheSizeGB
to limit the internal cache size for mongod
instances. See: To what size should I set the WiredTiger internal cache?.
Best Answer
Replacing
mongodb://
at the start withClusterADS-shard-0/
(which is the replica set) and removing/admin?...
made it work.