My answer is : It depends.
If you are accessing files by _id field, which is already indexed then you don't need to add more memory soon.
The _id field which is type of ObjectID is 12 byte in size. That means it can hold up to 2^(12*8) files. 3 byte is for machine ID which is a hash value and has a fixed vale on the machine can be subtracted which gives you approx 2^72 files. For the reference, 2^20 is 1,048,576.
In terms of the memory, the index on the _id field needs 10,000,000 x 12 byte = 114 MiBytes. To be honest, I don't now how much overhead there will be for an index which holds 10 millions value but I don't think that it will need more than 1 Gigabyte.
Now, if your _id field is not a type of ObjectID than do the math.
In the gridfs, filename value of the files collection is also indexed. If you are not accessing files using filename, then you may leave it blank and drop the index for the filename.
On the other side, if you will add some metadata to the files you added and want to query the files according to those metadata, then you should have indexes for those metadata and do the math again.
I have a production environment which has over 3,000,000 pdf files (takes 180 Gig space on the disk). My server is a virtual server which has 4 vCPU and 4 Gig RAM, still no problem. The specs you provided is way too high for your needs. You can save billions of files with those servers. Especially if you have SSD. Because even if your indexes do not fit into the memory, swapping will be very fast, you won't even notice a slowdown.
Yes, you can index them after you have imported (there will then only be the default _id
index on the collection). This is also recommended because the resulting indexes will be more compact and more efficient (for similar reasons foreground vs background indexing is preferred if you can afford to do it). It will take some time to complete though, especially with 10 indexes to build.
To build after the import, simply do not define any indexes until after your import is complete, then use the ensureIndex() command to create the required indexes afterwards (with the usual caveat that such index creation will be resource intensive). For more information:
http://docs.mongodb.org/manual/core/index-creation/
Best Answer
There is no priority system currently for writes (or reads, though you can send reads to secondaries) - the closest thing you will get is yielding. For long running operations and for operations that it predicts will page in data from disk, MongoDB will yield the lock and allow other operations through, essentially interleaving operations:
If you wanted to make sure that the less important writes are throttled somewhat you could rate limit them by ensuring they are replicated W=2, REPLICAS_SAFE, or similar writes (depending on your driver). See here for the command behind such implementations on the MongoDB side - take a look at your driver docs for the relevant equivalent there.
http://www.mongodb.org/display/DOCS/getLastError+Command#getLastErrorCommand-%7B%7Bw%7D%7D
There would then be a slight delay as the write waits for replication out to the secondaries, allowing your other, more important writes their shot at the write lock.
http://www.mongodb.org/display/DOCS/How+does+concurrency+work
In terms of the future, with 2.2 due out shortly, you will get database level locking, so as long as your 2 different profiles/priorities are in different databases you should have no lock contention (IO/RAM contention may still exist, of course).
Finally, in terms of other things to look at, for the line by line type read, I would look at capped collections and tailable cursors - see if they fit your use case:
http://www.mongodb.org/display/DOCS/Tailable+Cursors