I manage a MongoDB replica set for one of my customer and sometimes he send me a ticket (email) with only one argument :
~ WARNING Date/Time: ~Tue Jul 4 02:39:33 GMT 2017
Summary: ~WARNING - Queries / Sec: 9.674419
How understand this kind of message? I could guess that we have some slow query running. So the first thing I do it's to check the log. What i found in it's that we have some aggregate command that take 500-600ms to execute. Here an example :
2017-07-04T04:08:52.205Z I COMMAND [conn2104435]
command databaseName.cards_PWCD900004
command: aggregate { aggregate: "cards_PWCD900004",
pipeline: [ { $unwind: "$virtualCards" },
{ $project: { cardProfileId: 1, virtualCards.state: 1,
virtualCards.device: 1,
virtualCards.primaryDpanId: { $ifNull: [ "$virtualCards.primaryDpanId", null ] } } },
{ $group: { _id: { cardProductId: "$cardProfileId",
virtualCardsState: "$virtualCards.state",
deviceId: "$virtualCards.device" }, tokensNumber: { $sum: 1 },
virtualCardsNumber: { $sum: { $cond: { if: { $eq: [ "$virtualCards.primaryDpanId", null ] },
then: 1, else: 0 } } } } }, { $group: { _id: { issuerName: "APLUS",
cardProductId: "$_id.cardProductId",
virtualCardsState: "$_id.virtualCardsState" },
devicesNumber: { $sum: 1 }, virtualCardsNumber: { $sum: "$virtualCardsNumber" },
tokensNumber: { $sum: "$tokensNumber" } } } ] }
keyUpdates:0 writeConflicts:0 numYields:219 reslen:30763
locks:{ Global: { acquireCount: { r: 460 } },
Database: { acquireCount: { r: 230 } },
Collection: { acquireCount: { r: 230 } } } protocol:op_query 523ms
What I would like to know is how troubleshoot such thing? I could launch a mongotop or mongostat but if the issue just finished I will see nothing. I also could use db.serverStatus() but I don't really know which variables is the most significant.
database size : 2GB
SSL : true
About mongo.conf all it's by default. Here more information about hardware:
RAM : 8GB
2 proc : 2,3 GHz each
What are best practices an administrator should have to check if all works on his database ?
If you have any question don't hesitate please.
Best Answer
Lets first look at few of the output terms to understand better.
numYields
The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation. Does a read or write operation ever yield the lock?
reslen
The length in bytes of the operation’s result document. A large responseLength can affect performance. To limit the size of the result document for a query operation, you can use any of the following:
acquireCount
Number of times the operation acquired the lock in the specified mode.
How do you troubleshoot this thing?
I am taking a guess here as I cannot see your query plan. I suggest using explain method and looking at the plan. Especially focusing on index usage, scan/seek.
See this answer about high number of yields.
Monitor your instance for some key matrix which will help you tune your long running queries. In your case I would especially focus on
index misses
.