It's not a matter of the number of operations you will be doing, rather the number of connections and (possibly, though it would be unusual for this to play a big part) the number of server side javascript operations you plan to do (Map Reduce, mainly).
To explain: there will be a thread (and a file descriptor) for each connection made to/from the mongod
process (similar for mongos
) - therefore it is generally a good idea to have both values set beyond the hard coded 20,000 limit in MongoDB. You can see this if you run htop
or something like this command while you spin up new connections to the mongod
or mongos
processes:
ps uH p <PID_OF_U_PROCESS> | wc -l
Most users will never get anywhere near these maximum levels, so this is merely a precaution on most systems to avoid problems with low ulimits. In a large cluster with many mongos
processes you may see levels approaching this, but unless you are planning that level of deployment you will not have to worry.
For more information on the Map Reduce side of things, there is an excellent article on it, which includes thread use here.
This is a common misconception, i.e. that yields are somehow causing the slowness. In fact they are a symptom, not a cause. Even if there is no lock contention that requires a yield (writes basically), the queries still yield when they have to page from disk. They then re-acquire the lock when a certain amount of paging is done and look to yield again if more paging is needed (repeat until complete). If there is no lock contention from writes, then this is all pretty much instantaneous and does not add to the overall execution time.
If a query yields a lot, then it was hitting disk a lot, and that is the cause of the slowness - the disk access. Hence, numYields
is just a way to infer that it was indeed paging to disk that caused the query to be slow. If you want those queries to be fast, then you need to have that data set in memory, and have enough memory for it to stay there long terms and not be evicted.
Note: by default the kernel will use LRU to decide what gets evicted, so the likely candidates for slowness are queries on (large) parts of your data set that are not accessed very often.
There is no way to limit numYields
, and it wouldn't really make sense to do so, but yes the remedy is to identify the data being addressed by those slow queries and make it fit into memory (note: the first query on any data will still be slow unless you pre-heat in some way, the second query will be in memory and fast).
Best Answer
The default stack size for MongoDB is already 1024, not 8192 (it is set in the code, not as a system setting) and has been since version 1.8.3 (see SERVER-2707), so you are already seeing the benefits of a lower stack size.