Mongodb – What factors contribute to lock-percentage with MongoDB

lockingmongodb

We're attempting to better optimize how we're using our MongoDB instance. We seem to be routinely getting high lock-percentages, and are looking to help minimize that. Here is some mongostat output:

insert  query update delete getmore command flushes mapped  vsize    res faults locked % idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
     1    107    186      0       0     196       0  3.06g   7.3g   333m      0     11.2          0       0|0     2|0    66k   224k    85   15:55:22 
     2    102    285      0       0     296       0  3.06g   7.3g   333m      0     15.7          0       0|0     2|0    89k   216k    84   15:55:23 
     2     79    325      0       0     335       0  3.06g   7.3g   333m      0     20.2          0       0|0     3|0    96k   149k    85   15:55:24 
     2     92    193      0       0     203       0  3.06g   7.3g   333m      0     10.9          0       1|1     6|1    63k   149k    86   15:55:25 
     3    102    235      0       0     245       0  3.06g   7.3g   331m      0     14.5          0       0|0     2|0    75k   177k    84   15:55:26 
     3     79    267      0       0     275       0  3.06g   7.3g   331m      0     16.5          0       1|0     2|0    80k   133k    86   15:55:27 
     2     66    219      0       0     226       0  3.06g   7.3g   264m      0     14.3          0       0|0     2|0    66k   112k    88   15:55:28 
     2    100    201      0       0     211       0  3.06g   7.3g   334m      0     10.2          0       0|0     3|0    67k   142k    87   15:55:29 
     3    118    227      0       0     244       0  3.06g   7.3g   322m      0     13.8          0       3|1     6|1    78k   150k    87   15:55:30 
     2    112    189      0       0     198       0  3.06g   7.3g   334m      0     10.8          0       0|1     2|2    64k   213k    87   15:55:31 
     2     80    266      0       0     278       0  3.06g   7.3g   246m      0     15.8          0       0|1     3|1    82k   179k    86   15:55:32 
     1     82    307      0       0     314       0  3.06g   7.3g   334m      0     18.1          0       0|0     2|0    89k   158k    86   15:55:33 
     2     94    278      0       0     285       0  3.06g   7.3g   334m      0     17.1          0       0|0     0|0    83k   184k    86   15:55:34 
     3    101    246      0       0     256       0  3.06g   7.3g   332m      0     14.2          0       0|0     1|0    82k   179k    86   15:55:35 
     3     99    203      0       0     213       0  3.06g   7.3g   334m      0     12.5          0       0|0     2|0    67k   154k    88   15:55:36 
     2    115    174      0       0     189       0  3.06g   7.3g   335m      0       11          0       1|0     3|0    63k   172k    88   15:55:37 
     2     97    199      0       0     209       0  3.06g   7.3g   335m      0     10.3          0       0|0     2|0    65k   192k    87   15:55:38 
     2    103    366      0       0     373       0  3.06g   7.3g   334m      0     23.5          0       1|4     3|4   107k   256k    85   15:55:39 
     2    105    338      0       0     349       0  3.06g   7.3g   334m      0     22.9          0       0|0     1|0   101k   207k    83   15:55:40 

This is a lot better than it used to be, thanks to better indexing. However, we clearly have more to do. Things about this data-set:

  • Hardware is a 4-proc box, load-average is generally between 1.3 and 1.9
  • 4GB of RAM
  • The SAN-backed storage is reporting latencies peaking at 35ms, but generally between 5m and 20ms most of the time.
  • I/O operations are very low
  • The 'qr' and 'qw' numbers do suggest we're not suffering big queuing.

We're using Mongo to track meta-data as documents pass through our processing platform. A Mongo Document is created for each actual document we have (actual documents are ye olde Office-type files). Each processing stage queries some information, and then writes information back (some times quite a bit of it). Depending on the data we're working with, there can be many stages.

This is an update-heavy workload, so lock-percentage is a key scaling statistic. We haven't sharded yet, in large part because we need to see how far a single instance can scale before we need to shard.

What other areas do we need to investigate to reduce lock-percentage, or have we just hit the wall and need to shard?

Best Answer

Those are interesting stats. I think you might be suffering from document size growth in your updates, where the document needs to be copied to a new spot on the disk. If this is indeed the case, you might be able to recover some of that lock percentage by manually padding your documents. It adds a bit more complexity on the first insert, but isn't too bad. See this document from the official docs: http://www.mongodb.org/display/DOCS/Padding+Factor#PaddingFactor-ManualPadding

Just an idea...