The resident memory size represents the number of pages in memory actually touched by the mongod
process. If that is significantly lower than the available memory and data exceeds the available memory (yours does), then it could be a case of simply not having actively touched enough pages yet.
To determine if this is the case, you should run free -m
, the output should look something like this:
free -m
total used free shared buffers cached
Mem: 3709 3484 224 0 84 2412
-/+ buffers/cache: 987 2721
Swap: 3836 156 3680
In my example, cached is not close to the total, which means that not only has mongod not touched enough pages, the filesystem cache has not yet even been filled by pages being read from the disk in general.
A quick remedy for this would be the touch command (added in 2.2) - it should be used with caution on large data sets as it will attempt to load everything into RAM even if the data is far too large to fit (causing a lot of disk IO and page faults). It will certainly fill up the memory effectively though :)
If your cached value is close to the total available, then your issue is that a large number of pages being read into memory from disk are not relevant to (and hence not touched by) the mongod process. The usual candidate for this kind of discrepency is readahead. I've already covered that particular topic elsewhere in detail, so I'll just link those two answers for future reading if necessary.
First, 2.3 is a development branch that was turned into 2.4, you should not be using it any longer - hopefully you mean 2.2, which is still supported and developed (though it too will soon be end of life as of writing this answer).
The group method you mention runs server side javascript and is not going to be fast, especially in 2.2 which used the old spidermonkey engine and takes an exclusive javascript lock. Generally I would not recommend using it, especially with a sharded cluster, where it is unsupported.
Instead you should use the aggregation framework, which was added in 2.2, is improved in 2.4 and will be even better in 2.6. The framework includes a $group
operator, and and many others that you can use in a "pipeline" to achieve the results you want. If that seems odd, then I recommend the pipeline explanation docs here - for anyone familiar with the Linux/Unix shell, the concept should be very familiar.
To do a group on your field you would do something like this (this would be more specific and easier to be specific if you had actually included a sample document and desired output):
db.mytbl.aggregate(
[
{ $group : { _id : {date:"$date_col"} , count : { $sum : 1 } } }
]
)
You can find some more general examples here:
http://docs.mongodb.org/manual/applications/aggregation/
Best Answer
You will have to use the mongodb aggregation framework.
Check this example: https://docs.mongodb.org/manual/tutorial/aggregation-zip-code-data-set/