MongoDB performance limits of mongo

mongodb

There are 100 data files, each with 60 fields and over 4 million records. There is a perl program that inserts the records or updates them based on an userdefined _id field. There is also a History collection that stores all values ever written for three fields. A replica set with two servers and an arbiter has been set up. Initially the files were loading into the MongoDB at 45 minutes per file. After around 20 files the speed has dropped considerably. The speed at this time is 20 hours per file. The servers have started slowing down badly. I am unable to use the logout command even quickly.

I have built indexes on the _id field with hashed indexing and for the History collection I have built indexes with id and date field. The number of records at this time in the collections are 4 million for the actual data collection and around 100 million for the History collection. I have two 17 GB RAM processor, of which only around 3.5 gigs are used as per the Mongostat res command. However since the data records are to be inserted date wise sequentially, I cannot exploit parallelism either.

The limits of mongo for the specific scenario have been reached? Is this slowdown to be expected? I have fsynced manually every now and then to ensure files are being written to disk. Is there some other diagnostics that I can run to better explain the situation? Is there a solution to this?

Thanks

Best Answer

Community wiki answer generated from comments on the question by Markus W Mahlberg:

You might want to use bulk operations during insert. In case not all your RAM is used, it is safe to assume that either the disks or the inserting program is the limiting factor. With bulk ops, you speed up things for both limiting factors. There is something seriously wrong. Unless your fields are very big, 200k inserts/h are a mere joke. You want to have your code peer reviewed and make sure that IO operations are efficient. See the production notes for further details on this.

There are bulk upserts: bulk.find({...}).update({...},{upsert:true}). Furthermore, you can do var bulk=db.collection.initializeOrderedBulkOp().

Related Question