Mongodb – Make MongoDB close open files

database-designmongodb

I am currently trialing numerous database designs (to solve this problem: Large (>22 trillion items) geospatial dataset with rapid (<1s) read query performance), and my latest design in Mongo consists of using many (>20,000) collections. I am attempting a multi-collection design as I have been unable to achieve reasonable performance using a single collection, most notably when inserting any new data, which I suspect is because the index is becoming so large. From reading around design architecture, I realise a multi-collection design is not necessarily conventional, but that is not the subject of this question. However, If you have any alternative ideas, please respond to the question in the previous link.

My current design will divide my sample data set (100 million locations) into ~30,000 collections, which keeps the index small. Testing this design on a sub-set of ~150 collections (>200,000 locations), shows that my read and insert queries are running with acceptable performance, which I believe will maintain performance when scaled very large. Based on the times from this sub set, the full 100 million locations should take about 15 hours to initially insert into the database. However, when running my insert on this full sample set, after the creation of ~350 collections (in ~7 mins), I receive the error:

failed: 24: Too many open files

Upon reading other people who have run into this (often with WiredTiger), I see most people can overcome the problem by increasing the ulimit of the system:

But for my use case I don't believe this will work, as this system will scale up to a point where I expect the collection creation routine to grow too large for any sensible upper limit, not to mention I don't want to keep all these files open! For each collection creation, 3 new files are made (1 collection, 2 indexes), and to my knowledge, there is no need to keep these files open after the collection has been made. These collections are generated with a simple loop in python to insert the data for the first time, which repeats the following tasks:

  1. Gather data from satellite output
  2. Determine collection name to update (upsert)
  3. Perform update (create collection, if required)

Once a collection has been made, it will never be accessed again during this routine. So any files created/opened during the collection creation can be closed after step 3. Therefore, is it possible to make MongoDB, or the server, close any/all/specific files it currently has open? Or is there a reason I am unaware of as to why mongo keeps these files open?

Best Answer

If you are sure that while starting mongod instance mongo doesn't open all available db files then you should gracefully stop your mongod instance once in a while, while inserting and start it again, for example when opened files counter reaches ulimit on your system. This will close all opened files and open only the ones that you create a collection for.

Another solution is to adjust the number of collections to appropriate level.

Hope this helps.