MongoDB – Should MongoDB Balancer Be Disabled to Prevent moveChunk Block?

mongodb

In my team, we have a MongoDB 3.6 with 4 shard cluster.
My teammate built up Ops Manager and we found out at some point the queries are slow and they are all waiting for database level lock.

    "Database": {
      "acquireCount": {
        "w": 1
      },
      "acquireWaitCount": {
        "w": 1
      },
      "timeAcquiringMicros": {
        "w": 5383162
      }
    },

Dig inside and we found out the MongoDB was moving chunk and command require "W" exclusive lock at database level.
We feel quite confusing because

  1. We use hashed _id for shard key, which should be distributed evenly and every document size are quite the same. Is there any way to find out why MongoDB trigger moveChunk? Or any statistics for me to dig inside ?
  2. List all the slow query, Ops Manager didn't record the moveChunk command as slow query. How should I change my config or anything I should try ?
  3. Based on my case, should I try to disable balancer to prevent moveChunk and observe by myself ?

[Updated].
Here is the query which is lock on database "W" lock.

{
            "host" : "....", 
            "desc" : "migrateThread", 
            "active" : true, 
            "currentOpTime" : "2020-08-31T06:34:34.194+0000", 
            "opid" : "sh1:-472303473", 
            "secs_running" : 15, 
            "microsecs_running" : 15562026, 
            "op" : "none", 
            "ns" : "collection", 
            "command" : {
            }, 
            "msg" : "step 2 of 6", 
            "numYields" : 0.0, 
            "locks" : {
            }, 
            "waitingForLock" : false, 
            "lockStats" : {
                "Global" : {
                    "acquireCount" : {
                        "r" : 4, 
                        "w" : 4
                    }
                }, 
                "Database" : {
                    "acquireCount" : {
                        "w" : 3, 
                        "W" : 1
                    }, 
                    "acquireWaitCount" : {
                        "W" : 1
                    }, 
                    "timeAcquiringMicros" : {
                        "W" : 15665
                    }
                }, 
                "Collection" : {
                    "acquireCount" : {
                        "w" : 2, 
                        "W" : 1
                    }, 
                    "acquireWaitCount" : {
                        "W" : 1
                    }, 
                    "timeAcquiringMicros" : {
                        "W" : 7247
                    }
                }
            }
        }

Thanks

Best Answer

That was a lot of questions for one question, but here goes:

Is there any way to find out why MongoDB trigger moveChunk?

The balancer will trigger a chunk move for a collection when the number of chunks on one shard exceeds the number of chunks on any other shard by the migration threshold

Or any statistics for me to dig inside ?

Each balancer run is logged in the config.actionlog collection, and each individual chunk move should have 4 or 5 entries is the config.changelog collection. Each chunk move should also be logged in mongod.log for both the sending and receiving shard.

List all the slow query

Each mongod log will show all queries that run longer than 100ms, as well as all moveChunk operations.

Should I try to disable balancer to prevent moveChunk and observe by myself?

You might consider configuring a balancer window so the balancer does not run during times that you expect high load.

Regarding the lock itself, I had thought MongoDB 3.6 removed the need for the balancer to take a 'W' lock. A chunk move does place a fair bit of load on replication, which might cause other things to back up.

Edit

After posting this, I went back to take another look at the code. The only place I see a database-level 'W' lock (aka MODE_X) being taken by the migrateThread is prefaced by this comment in the mongod source:

// Hold the DBLock in X mode across creating the collection and indexes, so that a
// concurrent dropIndex cannot run between creating the collection and indexes and fail with
// IndexNotFound, though the index will get created.
// We could take the DBLock in IX mode while checking if the collection already exists and
// then upgrade it to X mode while creating the collection and indexes, but there is no way
// to upgrade a DBLock once it's taken without releasing it, so we pre-emptively take it in
// mode X.

Reading through, that function, it appears to:

  • check if the collection exists
  • if yes, verify the UUID of the collection
  • if no, create the collection
  • if the collection is empty on the local (recipient) shard, and the donor has sent a list of indexes, create those indexes

These operation should be nearly instant, so the total time the would be held should be on the order of single-digit milliseconds, at worst.

The migrateThread did have to wait for 15ms to acquired the database 'W' lock, but that doesn't really explain why the migrateThread process was running for over 4 hours, or why that other operation was waiting on the lock for nearly 2 hours.