I have a large number of records (~1 billion) that I need to load into MongoDB (actually TokuMX, but whatever). I have about 6 different indices I need to create on the collection. Is it always faster to load the data, and then create the indices? When I look at Mongo's logfile, It seems like Mongo does some kind of large operation (maybe a row count?) before actually starting index creation, and it does this for every index I create.
Will it always be faster to create the indices after loading the data?
If I wait until after loading the data, would it be faster to create each index in the background at the same time rather creating them than one-by-one?
Best Answer
Back in the day we would bulk load our data in this way:
That method always proved faster than leaving the indexes in place. However, this was for Sybase and SQL Server. I imagine other systems would be similar, but I can't say for certain.