Mongodb – Why findOne() hangs on a sharded collection

mongodbmongodb-3.6sharding

A MongoDB v3.6.4 database with a large sharded collection and WiredTiger engine hangs when issuing a findOne() or find() command without any parameters. The collection has only the default index on the _id key.

If findOne() or find() is invoked on any of the shards then it works immediately. If it is invoked on a mongos then it hangs. Why this query hangs?

By looking with db.currentOp() MongoDB uses COLLSCAN.

The funny thing is that this worked and then suddenly stopped yesterday.

As a note, there is a sharding migration process which moves chunks, but that doesn't seem to influence as even when chunk balancing is stopped or when it is finished, still findOne() hangs (or doesn't hang but scans all the collection).

Here is the output of db.currentOp() taken from one primary:

            {                                                                                                                                                                                         
                    "host" : "db1:27017",
                    "desc" : "conn277",
                    "connectionId" : 277,
                    "client" : "10.240.137.10:57294",
                    "appName" : "MongoDB Shell",
                    "clientMetadata" : {
                            "application" : {
                                    "name" : "MongoDB Shell"
                            },
                            "driver" : {
                                    "name" : "MongoDB Internal Client",
                                    "version" : "3.6.4"
                            },
                            "os" : {
                                    "type" : "Linux",
                                    "name" : "PRETTY_NAME=\"Debian GNU/Linux 9 (stretch)\"",
                                    "architecture" : "x86_64",
                                    "version" : "Kernel 4.9.0-6-amd64"
                            },
                            "mongos" : {
                                    "host" : "m1:27017",
                                    "client" : "10.240.0.0:38190",
                                    "version" : "3.6.4"
                            }
                    },
                    "active" : true,
                    "currentOpTime" : "2018-04-28T08:13:26.371+0200",
                    "opid" : 2148656,
                    "secs_running" : NumberLong(4),
                    "microsecs_running" : NumberLong(4344989),
                    "op" : "query",
                    "ns" : "somedb.somecol",
                    "command" : {
                            "find" : "somecol",
                            "limit" : NumberLong(1),
                            "shardVersion" : [
                                    Timestamp(188696, 1),
                                    ObjectId("5ac6b2abbd8bbc9f42f34a39")
                            ],
                            "$clusterTime" : {
                                    "clusterTime" : Timestamp(1524896001, 1),
                                    "signature" : {
                                            "hash" : BinData(0,"RuV/v6Qm7H9AvPMVRNH0jkdIwRM="),
                                            "keyId" : NumberLong("6540622458888650772")
                                    }
                            },
                            "$client" : {
                                    "application" : {
                                            "name" : "MongoDB Shell"
                                    },
                                    "driver" : {
                                            "name" : "MongoDB Internal Client",
                                            "version" : "3.6.4"
                                    },
                                    "os" : {
                                            "type" : "Linux",
                                            "name" : "PRETTY_NAME=\"Debian GNU/Linux 9 (stretch)\"",
                                            "architecture" : "x86_64",
                                            "version" : "Kernel 4.9.0-6-amd64"
                                    },
                                    "mongos" : {
                                            "host" : "m1:27017",
                                            "client" : "10.240.0.0:38190",
                                            "version" : "3.6.4"
                                    }
                            },
                            "$configServerState" : {
                                    "opTime" : {
                                            "ts" : Timestamp(1524896001, 1),
                                            "t" : NumberLong(3)
                                    }
                            },
                            "$db" : "feedback"
                    },
                    "planSummary" : "COLLSCAN",
                    "numYields" : 803,
                    "locks" : {
                            "Global" : "r",
                            "Database" : "r",
                            "Collection" : "r"
                    },
                    "waitingForLock" : false,
                    "lockStats" : {
                            "Global" : {
                                    "acquireCount" : {
                                            "r" : NumberLong(1608)
                                    }
                            },
                            "Database" : {
                                    "acquireCount" : {
                                            "r" : NumberLong(804)
                                    }
                            },
                            "Collection" : {
                                    "acquireCount" : {
                                            "r" : NumberLong(804)
                                    }
                            }
                    }
            },

Best Answer

By looking with db.currentOp() MongoDB uses COLLSCAN. Why would MongoDB want to use COLLSCAN for this?

COLLSCAN uses for a collection scan. As MongoDB Documentation Changed in version 3.0. here MongoDB provides the db.collection.explain() method, the cursor.explain() method, and the explain command to return information on query plans and execution statistics of the query plans.

The explain results present the query plans as a tree of stages. Each stage passes its results (i.e. documents or index keys) to the parent node. The leaf nodes access the collection or the indices. The internal nodes manipulate the documents or the index keys that result from the child nodes. The root node is the final stage from which MongoDB derives the result set.

Some of Stages are descriptive of the operation; e.g.

  • IXSCAN for scanning index keys
  • FETCH for retrieving documents
  • SHARD_MERGE for merging results from shards
  • SHARDING_FILTER for filtering out orphan documents from shards

And the db.currentOp() Returns a document that contains information on in-progress operations for the database instance.

db.currentOp() has the following form:

db.currentOp(<operations>)

where operations is optional. Which Specifies the operations to report on. Can pass either a boolean or a document. Specify true to include operations on idle connections and system operations. Specify a document with query conditions to report only on operations that match the conditions.