MongoDB – Why Did My Query Run Very Slow on a Mongos?

mongodb

I have a mongos which connects to three shard cluster. I have a few databases and collections in this mongos instance. All other collections work fine except one has a performance issue. It takes 45 minutes to run below command

db.orders.find().explain('executionStats')

The total number document in this collection is 111898400. Each shard cluster has about 30000000 documents. I know that I didn't run this query against the shard key but I just wonder whether this is the reason why it takes so long. I expect it takes a few minutes at most but not 45 minutes.

Below is the execution stats from above command. How can I profile it more to find out why it takes so long?

"executionStats" : {
        "nReturned" : 111432926,
        "executionTimeMillis" : 2727745,
        "totalKeysExamined" : 0,
        "totalDocsExamined" : 111899059,
        "executionStages" : {
            "stage" : "SHARD_MERGE",
            "nReturned" : 111432926,
            "executionTimeMillis" : 2727745,
            "totalKeysExamined" : 0,
            "totalDocsExamined" : 111899059,
            "totalChildMillis" : NumberLong(6550367),
            "shards" : [
                {
                    "shardName" : "s0",
                    "executionSuccess" : true,
                    "executionStages" : {
                        "stage" : "SHARDING_FILTER",
                        "nReturned" : 36545358,
                        "executionTimeMillisEstimate" : 2723505,
                        "works" : 36613353,
                        "advanced" : 36545358,
                        "needTime" : 67994,
                        "needYield" : 0,
                        "saveState" : 321990,
                        "restoreState" : 321990,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "chunkSkips" : 67993,
                        "inputStage" : {
                            "stage" : "COLLSCAN",
                            "nReturned" : 36613351,
                            "executionTimeMillisEstimate" : 2689811,
                            "works" : 36613353,
                            "advanced" : 36613351,
                            "needTime" : 1,
                            "needYield" : 0,
                            "saveState" : 321990,
                            "restoreState" : 321990,
                            "isEOF" : 1,
                            "invalidates" : 0,
                            "direction" : "forward",
                            "docsExamined" : 36613351
                        }
                    }
                },
                {
                    "shardName" : "s1",
                    "executionSuccess" : true,
                    "executionStages" : {
                        "stage" : "SHARDING_FILTER",
                        "nReturned" : 36423091,
                        "executionTimeMillisEstimate" : 1647595,
                        "works" : 36423093,
                        "advanced" : 36423091,
                        "needTime" : 1,
                        "needYield" : 0,
                        "saveState" : 305586,
                        "restoreState" : 305586,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "chunkSkips" : 0,
                        "inputStage" : {
                            "stage" : "COLLSCAN",
                            "nReturned" : 36423091,
                            "executionTimeMillisEstimate" : 1614736,
                            "works" : 36423093,
                            "advanced" : 36423091,
                            "needTime" : 1,
                            "needYield" : 0,
                            "saveState" : 305586,
                            "restoreState" : 305586,
                            "isEOF" : 1,
                            "invalidates" : 0,
                            "direction" : "forward",
                            "docsExamined" : 36423091
                        }
                    }
                },
                {
                    "shardName" : "s2",
                    "executionSuccess" : true,
                    "executionStages" : {
                        "stage" : "SHARDING_FILTER",
                        "nReturned" : 38464477,
                        "executionTimeMillisEstimate" : 2166740,
                        "works" : 38862619,
                        "advanced" : 38464477,
                        "needTime" : 398141,
                        "needYield" : 0,
                        "saveState" : 331525,
                        "restoreState" : 331525,
                        "isEOF" : 1,
                        "invalidates" : 0,
                        "chunkSkips" : 398140,
                        "inputStage" : {
                            "stage" : "COLLSCAN",
                            "nReturned" : 38862617,
                            "executionTimeMillisEstimate" : 2128083,
                            "works" : 38862619,
                            "advanced" : 38862617,
                            "needTime" : 1,
                            "needYield" : 0,
                            "saveState" : 331525,
                            "restoreState" : 331525,
                            "isEOF" : 1,
                            "invalidates" : 0,
                            "direction" : "forward",
                            "docsExamined" : 38862617
                        }
                    }
                }
            ]
        }
    },

Best Answer

Well, the size of the collection says it all. The query retrieves all the documents at their full shape from all shards, sends them over the network and then merges them into a single result.

Actually, at this number of records, I wouldn't find 45 minutes too much for a fullscan.