My answer is : It depends.
If you are accessing files by _id field, which is already indexed then you don't need to add more memory soon.
The _id field which is type of ObjectID is 12 byte in size. That means it can hold up to 2^(12*8) files. 3 byte is for machine ID which is a hash value and has a fixed vale on the machine can be subtracted which gives you approx 2^72 files. For the reference, 2^20 is 1,048,576.
In terms of the memory, the index on the _id field needs 10,000,000 x 12 byte = 114 MiBytes. To be honest, I don't now how much overhead there will be for an index which holds 10 millions value but I don't think that it will need more than 1 Gigabyte.
Now, if your _id field is not a type of ObjectID than do the math.
In the gridfs, filename value of the files collection is also indexed. If you are not accessing files using filename, then you may leave it blank and drop the index for the filename.
On the other side, if you will add some metadata to the files you added and want to query the files according to those metadata, then you should have indexes for those metadata and do the math again.
I have a production environment which has over 3,000,000 pdf files (takes 180 Gig space on the disk). My server is a virtual server which has 4 vCPU and 4 Gig RAM, still no problem. The specs you provided is way too high for your needs. You can save billions of files with those servers. Especially if you have SSD. Because even if your indexes do not fit into the memory, swapping will be very fast, you won't even notice a slowdown.
Best Answer
Here are several examples of YAML configs for Linux (Windows paths and options are a little different), essentially explicitly setting some defaults and commonly used settings.
First, a standalone
mongod
with the default port, path, journal settings - this would be the type of configuration used for local testing, with a few extras so show the general style:Some notes on this config:
wireObjectCheck: false
) in production, but for a bulk load of data for testing purposes, it will speed things up a little and is a minimal risk in such an environmentNow, let's look at a sample config file for a typical production replica set member with authentication enabled and running as part of a sharded cluster:
Some notes on this configuration:
Next up, a sample
mongos
config:The only required changes here are removals that don't apply to the
mongos
(since it does not store data) and the addition of theconfigDB
string, which must be identical on allmongos
processes. I added the maximum connections setting as an example, it's not required but can often be a good idea for larger clusters.Rounding out the sharded cluster we have a sample config server, which is really a subset of the replica set member with some minor changes:
Finally, MongoDB 3.0 (not yet released at the time of writing this) will introduce several new options, especially with the introduction of the new storage engines. Therefore, here is an example of how to configure the same replica set member, but this time with the WiredTiger storage engine and (the default) snappy compression method (note: altered from original because of SERVER-16266, and added sample
engineConfig
):As a final bonus addition, I showed how to bind multiple IP addresses using a list, in this case an external IP and the loopback IP.