I'm troubleshooting an application issue that uses Mongo. I've just started to use the mongostat utility. Does the "conn" column show the number of established, authenticated TCP connections to the database, or does it show the number of database connections that are actually running an operation against the Mongo database?
Mongodb – What does “conn” mean in mongostat utility
mongodb
Related Solutions
First, let me say that with the information given it will be hard to get at the root cause - it is usually an iterative process that takes multiple attempts to track down the culprit. In the interest of answering your "what next?" portion of the question rather than identifying the root cause, read on.....
First, a couple of recommendations:
- Get the host into MMS (it's free) - see http://mms.10gen.com - so you can graph stats over time and get a view of the issues without having to be sitting on the box running commands
- Get munin-node installed too, so you can correlate ops etc. with IO (install docs for MMS explain this).
Next, a couple of quick checks for common causes:
- What is your filesystem/kernel? - these generally need to be ext4/XFS and recent enough to have fallocate working (2.6.23 and 2.6.25 respectively) so that new file allocation is not slow
- Assuming you don't get MMS and munin installed, get iostat output to match up with mongostat to determine if IO is the root cause for the bottleneck
- Do you do any periodic batch updates that grow the documents significantly (i.e. that would cause moves)? Moves are expensive and can cause IO to get backed up
- Is your disk up to the data volume you are writing to it? MongoDB fsyncs to disk every 60 seconds by default, if the volume that needs to be synced after 60 seconds is massive (say because of an insert spike) then you can also run into issues
That's not an exhaustive list, I have seen other issues cause this, but that should get you started down the right path.
A few patterns are evident in your data:
- The primary activity is updates.
- Updates come for several seconds after query/inserts.
- The criteria fields in finds is indexed.
- Even so, the 'index miss' rate is very high.
- Background flush is very low (as it should be with SSD) so you're not getting I/O contention.
- Lock-average is extremely high.
Based on this, I'm suspecting that your document sizes are created small and continually appended to, and you get some significant variation in final document sizes which renders Mongo's padding factor optimization less useful. If that's happening, you're running into a particular write penalty that goes like this:
- Document is INSERTed, Mongo allocates extra free space due to padding factor. (write: 512B)
- Document is updated. (write 1KB)
- Document is updated. (write 1.5KB)
- Document is updated, but there isn't any adjacent free-space left.
- Mongo moves the Document to a new spot with free space.
- Mongo writes the whole, appended document in a new free block (write 2KB) and marks the old block as available.
- Mongo reindexes every indexable field on the document (depending on what kind of indexes you have, this could be a big write hit).
- Document is updated. (write 512B)
Stub records that are continually appended to are a bit of an anti-pattern with MongoDB due to the penalty of growing beyond the padding factor. You can get away with it if your documents end up about the same size, as the padding factor can compensate.
However, if you have to continually append data to records and can't rely on the padding-factor, you'll have to manually pad at insert time. When you create the record, add enough junk fields to make it close to your average document size, and delete/unset the junk field on your first insert. This will reduce the instance of moves like the above, and should bring your lock-average down.
I also suspect you're running into full table-scans to return records, as that's the 'idx miss' column in the mongostat
output. Update calls do run finds, that's how the system locates the record to update. An index miss invokes a double-read of the system; the first in the index to find it, and a second full table-scan to find the record. Typically, this is caused by there not being enough RAM to hold the index, but may also be caused if an UPDATE modifies an indexed field.
Related Question
- Mongodb high lock percentage / slow queries
- Mongodb – Mongo Scaling – Too Many Connections – Linear Horizontal Scaling with 100s of App Servers
- Mongodb – Sudden Mongodb high connections/queues, db stops responding
- Mongodb – mongo crashing due to too many open files
- MongoDB – Resolving Connection Opening Exceptions
Best Answer
Ref: https://docs.mongodb.com/manual/reference/program/mongostat/
The total number of open connections, meaning all authenticated connections. They can be running, runnable, waiting or in sleeping state.