MongoDB Insert Performance

aggregatemongodbperformance

I have some data which are aggregated in chunks and later pushed into another collection.

The problem is that aggregated data pushing to another collection is very slow and takes from few hours and i can't figure out how i can optimize it because i can't find the bottleneck.

The machine on which MongoDB is running are single cluster with 24GB of RAM.

Basically whole process looks like this:

// Aggregated resutls which basically is about 17k entries
var cursor = db.collection.runCommand("aggregate" { ... });

Iterate over it:

cursor.result.forEach(function(data, index) {
  var inc = db.runCommand({ findAndModify: 'another_collection', query  : { query }, update : { $inc:  update } });
    if (inc.value == null) {
  var up = db.another_collection.update(query, { $push : push });

   if (up.nMatched == 0) {
        db.another_collection.insert({});
     }
   }
});

What i'm doing in previously described code, just push'ing values to array or just incrementing it. And finally result in second collection looks like this:

{
    "_id" : {
        "id_1"      : "1",
        "id_2"      : "2",
    },
    "value" :
    {
        "2_0"   :   [
            {
                "date"  :   ISODate("2014-10-01T00:00:00.000Z"),
                "val_1" : 1,
                "val_2" : 0,
                "count" : 1
            },
            {
                "date"  :   ISODate("2014-10-02T00:00:00.000Z"),
                "val_1" : 6,
                "val_2" : 2,
                "count" : 6
            }
        ],
        "2_1"   :   [
            {
                "date"  :   ISODate("2014-10-06T00:00:00.000Z"),
                "val_1" : 1,
                "val_2" : 0,
                "count" : 1
            },
            {
                "date"  :   ISODate("2014-10-07T00:00:00.000Z"),
                "val_1" : 6,
                "val_2" : 2,
                "count" : 6
            }
        ]
    }
}

And there aggregation performance is great, but the bottleneck is that aggregated data insertion in other collection and i can't figure out why it's so slow.
Documents count in another_collection is about 1558995 and groving.

I have done some researches like an I/O Stat, MongoStat, MongoTop:

I/O stat:

Device:         rrqm/s   wrqm/s     r/s     w/s    rMB/s    wMB/s avgrq-sz avgqu-sz   await r_await w_await  svctm  %util
xvda              0.00     0.00    2.00   45.00     0.01     0.09     4.09     0.07    1.45    6.00    1.24   1.19   5.60
xvdb              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00
dm-0              0.00     0.00    2.00   18.00     0.01     0.09     9.60     0.06    3.00    6.00    2.67   2.80   5.60
dm-1              0.00     0.00    0.00    0.00     0.00     0.00     0.00     0.00    0.00    0.00    0.00   0.00   0.00

MongoStat:

insert  query update delete getmore command flushes mapped  vsize    res faults             locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
    *0      1     39     *0       1   160|0       0   198g   403g  14.5g      1  db:28.9%          0       1|0     1|0   207k   231k    67   15:58:02 
    *0     *0     38     *0       0   178|0       0   198g   403g  14.6g      1  db:22.4%          0       2|0     1|0   235k   172k    67   15:58:03 
    *0     *0     34     *0       0   198|0       0   198g   403g  14.7g      0  db:68.1%          0       2|0     0|1   232k   186k    67   15:58:04 
    *0     *0     42     *0       0   270|0       0   198g   403g  14.6g      0 db:132.6%          0       0|0     0|1   298k   218k    67   15:58:05 
    *0      6     49     *0       5   183|0       0   198g   403g  14.7g      2 db:141.0%          0       0|0     0|0   292k     4m    67   15:58:06 
    *0      4     60     *0       4   495|0       0   198g   403g  14.7g      7  db:55.1%          0       0|0     0|1   411k   599k    67   15:58:07 
    *0     11     44     *0      10   296|0       0   198g   403g  14.5g      1  db:40.4%          0       0|0     1|0   299k   906k    67   15:58:08 
    *0      7     37     *0       6   184|0       0   198g   403g  14.5g      3  db:20.4%          0       3|1     0|1   221k   589k    67   15:58:09 
    *0      8     26     *0       9   139|0       0   198g   403g  14.5g      2  db:41.4%          0       0|0     1|2   157k   775k    67   15:58:10 
    *0      7     44     *0       7   245|0       0   198g   403g  14.5g      1  db:25.2%          0       1|0     1|0   285k   684k    67   15:58:11 
insert  query update delete getmore command flushes mapped  vsize    res faults             locked db idx miss %     qr|qw   ar|aw  netIn netOut  conn       time 
    *0      2     50     *0       2   222|0       0   198g   403g  14.5g      2  db:25.8%          0       2|0     1|0   249k   341k    67   15:58:12 
    *0     11     38     *0       9   151|0       0   198g   403g  14.6g      6  db:22.1%          0       2|0     1|0   224k   774k    67   15:58:13 
    *0      3     31     *0       3   147|0       0   198g   403g  14.7g      1  db:15.0%          0       2|0     0|1   215k   369k    67   15:58:14 
     1      5     30     *0       5   134|0       0   198g   403g  14.5g      1  db:30.9%          0       0|0     2|3   200k   495k    67   15:58:15 
    *0      1     14     *0       1    64|0       0   198g   403g  14.5g      0  db:24.3%          0       3|0     0|1    83k   197k    67   15:58:16

MongoTop:

  ns                         total        read       write      2014-11-23T14:00:34
  db.system.namespaces      2751ms      2751ms         0ms
  db.another_collection     233ms         0ms       233ms

  ns                        total        read       write       2014-11-23T14:00:35
  db.system.namespaces      2858ms      2858ms         0ms
  db.another_collection     285ms         0ms       285ms

Any help would be very appreciated!

UPDATE

Well i have figured out that findAndModify does'nt have indexes, but when i added indexes I/O %util increased up to 100.

Next step i guess should be try to implement Bulk operations ( http://docs.mongodb.org/manual/reference/method/js-bulk/ ) or update disk drives to SSD.

Best Answer

You want to use bulk write operations instead of individual updates and inserts.

// BEFORE your iteration loop
var bulk = db.another_collection.initializeUnorderedBulkOp();

// INSIDE your iteration loop
// instead of db.another_collection.update(…)
bulk.find(query).update(yourUpdateDocumentHere);

// instead of db.another_collection.insert(…)
bulk.insert(yourDocumentToInsertHere);

// AFTER your iteration loop
bulk.execute()

This should speed up the write operations dramatically.

EDIT: as for the bottleneck, extending arrays is a costly operation, since it forces MongoDB to relocate documents often, since the padding will be exceeded. As a rule of thumb: if you have an array which you have to extend often, there is something wrong with your data model. More often than not, it is worth to revise it.