Mongodb – Mongo queries are slow and yielding/waiting for locks thousand times

lockingmongodbmongodb-3.0optimizationperformancequery-performance

We are using mongo setup with replica sets on AWS.
Setup Details:
1 Primary node: r3.8xlarge
4 secondary nodes: r5.xlarge
Mongo version: 3.0.8 (WiredTiger)
Database size: 358GB

We have configured mongo cloud monitoring for this setup. Here are few stats from it:

Mongo Primary Network: 350MB/S (In: 40MB/S, Out: 310MB/S, Num of requests: 5.32KB/S)
1.1K Connections on average on primary node
Tickets available on primary: Reads: 125, Write: 100
Queues: Total: 15-20, Read : 0-1, Write: 15-20

Other Points:
CPU and memory stats on the instance look pretty much under control.
We are using 800gb GP2 EBS volume(2400 IOPS) and we are consuming around 2000 IOPS. The burst balance is almost available to full capacity which means that also is not exhausted.
Primary node is of r3.8xlarge type, so it has 10gig network.
Ulimits are set as follows:

Limit Soft Limit Hard Limit Units
Max cpu time unlimited unlimited seconds
Max file size unlimited unlimited bytes
Max data size unlimited unlimited bytes
Max stack size 8388608 unlimited bytes
Max core file size 0 unlimited bytes
Max resident set unlimited unlimited bytes
Max processes 64000 64000 processes
Max open files 64000 64000 files
Max locked memory 65536 65536 bytes
Max address space unlimited unlimited bytes
Max file locks unlimited unlimited locks
Max pending signals 1967994 1967994 signals
Max msgqueue size 819200 819200 bytes
Max nice priority 0 0
Max realtime priority 0 0
Max realtime timeout unlimited unlimited us

Problem:
We are accessing mongo using Java application and we are seeing that a lot of our queries are taking considerable amount of time such as 2-4 seconds.
We enabled profiling and listed slowest queries. In that we found that a sample query for update object, acquired intent lock 17000 times and had to yield those many times.
We are not able to understand if it is intent lock then why does it need to yield so many times? Does intent lock also not allow other operations to proceed?
If the query has to yield so many times, how can we see the advantage of document level lock?
We have also seen some of our queries getting timed out(probably waiting for the lock too long and eventually dying).
Could you please guide us on how to debug this problem and improve the performance of our mongo ?

Here's the profiler output of a sample query:

{
    "op" : "update",
    "ns" : "backend.Scores",
    "query" : {
            "channel.id" : "hkxj",
            "metric" : "YVR",
            "date" : ISODate("2018-11-20T10:00:00Z")
    },
    "updateobj" : {
            "$set" : {
                    "channel" : {
                            "id" : "hkxj",
                            "failures" : 39,
                            "items" : [

                                    {
                                            "_id" : ObjectId("5bf3e434800075956f1"),
                                            "image" : "http://images.abcd.com/something/wscoob.jpg",
                                            "b_time" : ISODate("2018-11-26T19:24:00Z"),
                                            "title" : "What's New ",
                                            "id" : "fq969"
                                    },
                                    {
                                            "_id" : ObjectId("5bf3e43f800075956f0"),
                                            "image" : "http://images.abcd.com/something/w4citv2.jpg",
                                            "broadcast_time" : ISODate("2018-11-26T20:24:00Z"),
                                            "title" : "Let's End This",
                                            "id" : "fsfgd"
                                    }
                            ]
                    },
                    "metric" : "YVR",
                    "date" : ISODate("2018-11-20T10:00:00Z")
            },
            "$setOnInsert" : {
                    "__v" : 0
            }
    },
    "nscanned" : 0,
    "nscannedObjects" : 2209900,
    "nMatched" : 1,
    "nModified" : 1,
    "keyUpdates" : 0,
    "writeConflicts" : 0,
    "numYield" : 17264,
    "locks" : {
            "Global" : {
                    "acquireCount" : {
                            "r" : NumberLong(17266),
                            "w" : NumberLong(17266)
                    }
            },
            "Database" : {
                    "acquireCount" : {
                            "w" : NumberLong(17266)
                    }
            },
            "Collection" : {
                    "acquireCount" : {
                            "w" : NumberLong(17265)
                    }
            },
            "oplog" : {
                    "acquireCount" : {
                            "w" : NumberLong(1)
                    }
            }
    },
    "millis" : 3864,
    "execStats" : {

    },
    "ts" : ISODate("2018-11-20T10:40:25.104Z"),
    "client" : "172.2.3.52",
    "allUsers" : [ ],
    "user" : ""
}

Best Answer

We are accessing mongo using Java application and we are seeing that a lot of our queries are taking considerable amount of time such as 2-4 seconds.

The provided output indicates a query that requires a collection scan of 2,209,900 documents to update a single document:

"nscanned" : 0,
"nscannedObjects" : 2209900,
"nMatched" : 1,
"nModified" : 1,

we found that a sample query for update object, acquired intent lock 17000 times and had to yield those many times.

Since a collection scan is a long-running query it will yield multiple times to allow other operations to interleave. See: Does a read or write operation ever yield the lock? in the MongoDB Concurrency FAQ.

You can significantly improve this query by adding a supporting index on {'channel.id': 1, metric: 1, date: 1}. Use the query explain option (i.e. db.Scores.find({..}).explain(true)) to see candidate indexes and usage.

Note: to minimise impact in your production environment you'll want to do a background or rolling index build when adding this index. The MongoDB documentation includes more information on procedures and behaviour: Index Build Operations on a Populated Collection.

We enabled profiling and listed slowest queries.

If your deployment is already stressed, I would caution against enabling the Database Profiler as this will add additional write load and only capture a limited sample of queries (by default profiling uses a 1MB capped collection for each database).

Since slow query information is already available in the MongoDB server logs, I would instead take a log-based approach. Some general tips can be found on ServerFault: Understanding IXSCAN and COLLSCAN in MongoDB logs.

What I found is time to acquire lock is 2497196 micros. (2497 millis).How can I reduce the time required to acquire a lock?

Significant time acquiring locks suggests either contention or a resource constraint (for example, I/O). If you have optimised all of your slow queries you'd have to look into more general metrics and what is logged during periods of notable slowness. Your question description suggests that you aren't concerned about resource limits yet, so logs and serverStatus metrics are a more likely focus.

Could you please guide us on how to debug this problem and improve the performance of our mongo ?

Overall performance tuning is a much bigger topic than an answer here, however some suggested starting points are:

  • Add a monitoring service to capture & visualise MongoDB metrics and history. If you don't have a solution in place yet, I'd look for a SaaS service like MongoDB Cloud Manager which has a free tier for monitoring (and more detailed metrics available with a free trial).
  • Investigate common slow query patterns and create appropriate indexes.
  • Upgrade to the latest version of MongoDB 3.0. MongoDB 3.0.8 was released in Dec 2015; the final version in that series was 3.0.15 (released in May 2017). There have been a lot of improvements in performance and stability, so it would be good to minimise the possibility you are encountering known issues that have been addressed.
  • Review the MongoDB Production Notes for suggestions relevant to your deployment environment.
  • Plan and test your upgrade path to a supported release series (currently MongoDB 3.4 or newer). WiredTiger was first introduced in the MongoDB 3.0 release series and there have been many significant improvements in successive releases of MongoDB.