MongoDB – Troubleshooting Query Per Second Performance

mongodb-3.2performancequery-performance

I manage a MongoDB replica set for one of my customer and sometimes he send me a ticket (email) with only one argument :

~ WARNING Date/Time: ~Tue Jul 4 02:39:33 GMT 2017 
Summary: ~WARNING - Queries / Sec: 9.674419

How understand this kind of message? I could guess that we have some slow query running. So the first thing I do it's to check the log. What i found in it's that we have some aggregate command that take 500-600ms to execute. Here an example :

2017-07-04T04:08:52.205Z I COMMAND  [conn2104435] 
command databaseName.cards_PWCD900004 
command: aggregate { aggregate: "cards_PWCD900004", 
pipeline: [ { $unwind: "$virtualCards" }, 
{ $project: { cardProfileId: 1, virtualCards.state: 1, 
virtualCards.device: 1, 
virtualCards.primaryDpanId: { $ifNull: [ "$virtualCards.primaryDpanId", null ] } } }, 
{ $group: { _id: { cardProductId: "$cardProfileId", 
virtualCardsState: "$virtualCards.state", 
deviceId: "$virtualCards.device" }, tokensNumber: { $sum: 1 }, 
virtualCardsNumber: { $sum: { $cond: { if: { $eq: [ "$virtualCards.primaryDpanId", null ] }, 
then: 1, else: 0 } } } } }, { $group: { _id: { issuerName: "APLUS", 
cardProductId: "$_id.cardProductId", 
virtualCardsState: "$_id.virtualCardsState" }, 
devicesNumber: { $sum: 1 }, virtualCardsNumber: { $sum: "$virtualCardsNumber" }, 
tokensNumber: { $sum: "$tokensNumber" } } } ] } 
keyUpdates:0 writeConflicts:0 numYields:219 reslen:30763 
locks:{ Global: { acquireCount: { r: 460 } }, 
Database: { acquireCount: { r: 230 } }, 
Collection: { acquireCount: { r: 230 } } } protocol:op_query 523ms

What I would like to know is how troubleshoot such thing? I could launch a mongotop or mongostat but if the issue just finished I will see nothing. I also could use db.serverStatus() but I don't really know which variables is the most significant.

database size : 2GB
SSL : true

About mongo.conf all it's by default. Here more information about hardware:

RAM : 8GB
2 proc : 2,3 GHz each

What are best practices an administrator should have to check if all works on his database ?

If you have any question don't hesitate please.

Best Answer

Lets first look at few of the output terms to understand better.

numYields

The number of times the operation yielded to allow other operations to complete. Typically, operations yield when they need access to data that MongoDB has not yet fully read into memory. This allows other operations that have data in memory to complete while MongoDB reads in data for the yielding operation. Does a read or write operation ever yield the lock?

reslen

The length in bytes of the operation’s result document. A large responseLength can affect performance. To limit the size of the result document for a query operation, you can use any of the following:

  • Projections
  • The limit() method
  • The batchSize() method

acquireCount

Number of times the operation acquired the lock in the specified mode.

How do you troubleshoot this thing?

I am taking a guess here as I cannot see your query plan. I suggest using explain method and looking at the plan. Especially focusing on index usage, scan/seek.

See this answer about high number of yields.

Monitor your instance for some key matrix which will help you tune your long running queries. In your case I would especially focus on index misses.

  • mongotop - shows how much time was spend reading or writing each collection over the last second
  • mongostat - brilliant live debug tool, gives a view on all your connected MongoDB instances