Mongodb – Heavy load of one of shards in MongoDB

mongodbperformancesharding

I have a problem with one of our shards. For some reason I'm getting the next usage:

15:38:42 up 4 days, 10:28, 1 user, load average: 3.67, 3.50, 3.56

In the mean time all other shards are ok. Their average load approx:

16:49:58 up 11 days, 56 min, 1 user, load average: 0.03, 0.03, 0.05

I have 3config, 2 query and 7 shards (all of them are primary replicaset. don't ask why).

I have 2 databases: 20Gb and 30gb. All servers are the same in their configuration. Also, in each database only 1 collection. In MMS control panel I see:

Write Lock percentage

Maybe a problem here?
How I can investigate this issue and fix it?

As I'm new with MongoDB I have used the next article to configure my shards:

https://stackoverflow.com/questions/19672302/how-to-programatically-pre-split-a-guid-based-shard-key-with-mongodb

Please check 1st accepted answer. Or how can I sheck my shard key myself?

Best Answer

The problem explained

As per your comment, your shard key is the _id field of the document. This field is monotonically increasing, basically like an incremented integer.

Put simply, sharding works this way: documents are stored in chunks. Those chunks are spread over the cluster based on ranges of the shard key. Let's look at a simple example:

  • s1: chunks holding id 0-50
  • s2: chunks holding id 51-100
  • s3: chunks above 101-?

And here is where the problem occurs: there is one shard were all new documents go to. What happens then is that new chunks are created all the time (as chunks have fixed size). So not only are all writes focussed on one server, this server also has to do some overhead work. After a certain threshold is met, the cluster will start to move chunks to the other shards, which only will balance out disk space, though. In out example, it might look like this:

  • s1: chunks holding id 0-100
  • s2: chunks holding id 101-200
  • s3: chunks holding id 201-?

Obviously, all write operations will still go to s3.

What went wrong?

You chose the wrong shard key. Monotonically increasing shard keys lead to the problem explained above. Altough ObjectIds look like hash sums, but they aren't. They are monotonically increasing.

What can be done?

You need a better shard key. Since a shard key can not be changed after a collection is sharded, migrating to a new shard key is not a simple thing to do. However, there is a quite detailed explanation in mongoDB's ticket system. Basically, it works like this:

  1. Remove all but one shard from the cluster
  2. Wait for the chunk migrations to be finished
  3. Shut down the remaining shard
  4. Change the shard key through a mongos instance
  5. Restart the last shard
  6. Add the other shards to the cluster