MongoDB – Is It Always Faster to Create Indexes After Loading Data?

indexmongodboptimizationtokumx

I have a large number of records (~1 billion) that I need to load into MongoDB (actually TokuMX, but whatever). I have about 6 different indices I need to create on the collection. Is it always faster to load the data, and then create the indices? When I look at Mongo's logfile, It seems like Mongo does some kind of large operation (maybe a row count?) before actually starting index creation, and it does this for every index I create.

Will it always be faster to create the indices after loading the data?

If I wait until after loading the data, would it be faster to create each index in the background at the same time rather creating them than one-by-one?

Best Answer

Back in the day we would bulk load our data in this way:

Drop indexes
Load data in the order for which the clustered index would be built (i.e., you export the data in a precise way)
After the load is completed, create the clustered index
Next, create any additional non-clustered indexes
Miller time (this was before I could afford decent beer)

That method always proved faster than leaving the indexes in place. However, this was for Sybase and SQL Server. I imagine other systems would be similar, but I can't say for certain.

Best Answer

Related Solutions

MongoDB Indexes – How Are Indexes Stored on Disk?

MongoDB TokuMX – Tuning for Fastest Bulk Reads

Related Question