I have some data which are aggregated in chunks and later pushed into another collection.
The problem is that aggregated data pushing to another collection is very slow and takes from few hours and i can't figure out how i can optimize it because i can't find the bottleneck.
The machine on which MongoDB is running are single cluster with 24GB of RAM.
Basically whole process looks like this:
// Aggregated resutls which basically is about 17k entries
var cursor = db.collection.runCommand("aggregate" { ... });
Iterate over it:
cursor.result.forEach(function(data, index) {
var inc = db.runCommand({ findAndModify: 'another_collection', query : { query }, update : { $inc: update } });
if (inc.value == null) {
var up = db.another_collection.update(query, { $push : push });
if (up.nMatched == 0) {
db.another_collection.insert({});
}
}
});
What i'm doing in previously described code, just push'ing values to array or just incrementing it. And finally result in second collection looks like this:
{
"_id" : {
"id_1" : "1",
"id_2" : "2",
},
"value" :
{
"2_0" : [
{
"date" : ISODate("2014-10-01T00:00:00.000Z"),
"val_1" : 1,
"val_2" : 0,
"count" : 1
},
{
"date" : ISODate("2014-10-02T00:00:00.000Z"),
"val_1" : 6,
"val_2" : 2,
"count" : 6
}
],
"2_1" : [
{
"date" : ISODate("2014-10-06T00:00:00.000Z"),
"val_1" : 1,
"val_2" : 0,
"count" : 1
},
{
"date" : ISODate("2014-10-07T00:00:00.000Z"),
"val_1" : 6,
"val_2" : 2,
"count" : 6
}
]
}
}
And there aggregation performance is great, but the bottleneck is that aggregated data insertion in other collection and i can't figure out why it's so slow.
Documents count in another_collection
is about 1558995 and groving.
I have done some researches like an I/O Stat, MongoStat, MongoTop:
I/O stat:
Device: rrqm/s wrqm/s r/s w/s rMB/s wMB/s avgrq-sz avgqu-sz await r_await w_await svctm %util
xvda 0.00 0.00 2.00 45.00 0.01 0.09 4.09 0.07 1.45 6.00 1.24 1.19 5.60
xvdb 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
dm-0 0.00 0.00 2.00 18.00 0.01 0.09 9.60 0.06 3.00 6.00 2.67 2.80 5.60
dm-1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
MongoStat:
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
*0 1 39 *0 1 160|0 0 198g 403g 14.5g 1 db:28.9% 0 1|0 1|0 207k 231k 67 15:58:02
*0 *0 38 *0 0 178|0 0 198g 403g 14.6g 1 db:22.4% 0 2|0 1|0 235k 172k 67 15:58:03
*0 *0 34 *0 0 198|0 0 198g 403g 14.7g 0 db:68.1% 0 2|0 0|1 232k 186k 67 15:58:04
*0 *0 42 *0 0 270|0 0 198g 403g 14.6g 0 db:132.6% 0 0|0 0|1 298k 218k 67 15:58:05
*0 6 49 *0 5 183|0 0 198g 403g 14.7g 2 db:141.0% 0 0|0 0|0 292k 4m 67 15:58:06
*0 4 60 *0 4 495|0 0 198g 403g 14.7g 7 db:55.1% 0 0|0 0|1 411k 599k 67 15:58:07
*0 11 44 *0 10 296|0 0 198g 403g 14.5g 1 db:40.4% 0 0|0 1|0 299k 906k 67 15:58:08
*0 7 37 *0 6 184|0 0 198g 403g 14.5g 3 db:20.4% 0 3|1 0|1 221k 589k 67 15:58:09
*0 8 26 *0 9 139|0 0 198g 403g 14.5g 2 db:41.4% 0 0|0 1|2 157k 775k 67 15:58:10
*0 7 44 *0 7 245|0 0 198g 403g 14.5g 1 db:25.2% 0 1|0 1|0 285k 684k 67 15:58:11
insert query update delete getmore command flushes mapped vsize res faults locked db idx miss % qr|qw ar|aw netIn netOut conn time
*0 2 50 *0 2 222|0 0 198g 403g 14.5g 2 db:25.8% 0 2|0 1|0 249k 341k 67 15:58:12
*0 11 38 *0 9 151|0 0 198g 403g 14.6g 6 db:22.1% 0 2|0 1|0 224k 774k 67 15:58:13
*0 3 31 *0 3 147|0 0 198g 403g 14.7g 1 db:15.0% 0 2|0 0|1 215k 369k 67 15:58:14
1 5 30 *0 5 134|0 0 198g 403g 14.5g 1 db:30.9% 0 0|0 2|3 200k 495k 67 15:58:15
*0 1 14 *0 1 64|0 0 198g 403g 14.5g 0 db:24.3% 0 3|0 0|1 83k 197k 67 15:58:16
MongoTop:
ns total read write 2014-11-23T14:00:34
db.system.namespaces 2751ms 2751ms 0ms
db.another_collection 233ms 0ms 233ms
ns total read write 2014-11-23T14:00:35
db.system.namespaces 2858ms 2858ms 0ms
db.another_collection 285ms 0ms 285ms
Any help would be very appreciated!
UPDATE
Well i have figured out that findAndModify does'nt have indexes, but when i added indexes I/O %util increased up to 100.
Next step i guess should be try to implement Bulk operations ( http://docs.mongodb.org/manual/reference/method/js-bulk/ ) or update disk drives to SSD.
Best Answer
You want to use bulk write operations instead of individual updates and inserts.
This should speed up the write operations dramatically.
EDIT: as for the bottleneck, extending arrays is a costly operation, since it forces MongoDB to relocate documents often, since the padding will be exceeded. As a rule of thumb: if you have an array which you have to extend often, there is something wrong with your data model. More often than not, it is worth to revise it.