Mongodb – Should I index before or after importing a dump

indexmongodb

I am using mongodb to persist a very big-size (90G), which has nearly 40,000,000 items. I read and parse this file and insert all items into mongodb (my programming language is perl, batch_insert, and I map one item to one mongodb document).

Before I insert, I have already pre-created indexes (about 10 index keys). I find that insert speed cannot meet my need (200 to 400 items per second). I know that too many index keys will definitely slow down my insert, especially when the size of collection becomes quite big. So I wonder if I can index them after I have dumped all the data into db.

Can anyone tell me if this way is available, and if it can definitely save my time?

Best Answer

Yes, you can index them after you have imported (there will then only be the default _id index on the collection). This is also recommended because the resulting indexes will be more compact and more efficient (for similar reasons foreground vs background indexing is preferred if you can afford to do it). It will take some time to complete though, especially with 10 indexes to build.

To build after the import, simply do not define any indexes until after your import is complete, then use the ensureIndex() command to create the required indexes afterwards (with the usual caveat that such index creation will be resource intensive). For more information:

http://docs.mongodb.org/manual/core/index-creation/