Mongodb – How set balancing as main task for sharder MongoDB cluster

mongodbsharding

I have a big collection (over 400 millions records) which I want to divided between 4 shards. I created infrastructure (4 shards, config server, mongos) and all work well (balancer is working and data is being moved). It has being worked greater than 15 hours and working now. My shards have 2662, 77, 77, 77 chunks. It is too slow!
As I know balancer try to keep mongo available. But now nobody use db and my main task is to finish balancing asap.
Could anybody know any command which set balancing as the most task?

Best Answer

My shards have 2662, 77, 77, 77 chunks.

If you are creating a new sharded collection and want to minimise the time to re-balance a large amount of data, the recommended approach would be to pre-split chunks in an empty collection before inserting the data. In your example, you would have pre-split based on the data distribution so each shard would have an equal number of chunks representing about 1/4 of the data.

Assuming you have chosen a suitable shard key for data distribution going forward, there should not be a significant number of regular chunk migrations unless you are adding or removing shards.

Could anybody know any command which set balancing as the most task?

If your shards are backed by replica sets you can make the balancer more aggressive by disabling _secondaryThrottle, which reduces the write concern used for documents migrating between shards. By default, _secondaryThrottle is true which is equivalent to a {w:2} write concern: each document move during chunk migration propagates to at least one secondary before the balancer proceeds with the next document. In MongoDB 3.0+, there is also an option to configure an explicit writeConcern for the secondaryThrottle operation.

For example, to disable _secondaryThrottle from a mongos shell:

db.getSiblingDB('config').settings.update(
   { "_id" : "balancer" },
   { $set : { "_secondaryThrottle" : false } },
   { upsert : true }
)

Balancer setting updates will not take immediate effect if there is currently a balance round / migration in progress, so you may want to disable & re-enable the balancer.

For normal usage the default _secondaryThrottle behaviour is recommended to ensure documents have propagated and reduce the impact of balancing. If your replica sets have more than 3 members, you could also consider increasing the writeConcern to majority (see: Write Concerns for Replica Sets).