Linux – SSD as a read cache for FREQUENTLY read data

bcachecachelinuxlvmssd

I'm looking for ways to make use of an SSD to speed up my system. In “Linux equivalent to ReadyBoost?” (and the research that triggered for me) I've learned about bcache, dm-cache and EnhanceIO. All three of these seem capable of caching read data on SSD.

However, unless I'm missing something, all three seem to store a file/block/extent/whatever in cache the first time it is read. Large sequential reads might be an exception, but otherwise it seems as if every read cache miss would cause something to get cached. I'd like the cache to cache those reads I use often. I'm worried that a search over the bodies of all my maildir files or a recursive grep in some large directory might evict large portions of stuff I read far more often.

Is there any technology to cache frequently read files, instead of recently read ones? Something which builds up some form of active set or some such? I guess adaptive replacement might be a term describing what I'm after.

Lacking that, I wonder whether it might make sense to use LVM as a bottom layer, and build up several bcache-enabled devices on top of that. The idea is that e.g. mail reads would not evict caches for /usr and the likes. Each mounted file system would get its own cache of fixed size, or none at all. Does anyone have experience with bcache on top of lvm? Is there a reason against this approach?

Any alternative suggestions are welcome as well. Note however that I'm looking for something ready for production use on Linux. I feel ZFS with its L2ARC feature doesn't fall in that category (yet), although you are welcome to argue that point if you are convinced of the opposite. The reason for LVM is that I want to be able to resize space allocated for those various file systems as needed, which is a pain using static partitioning. So proposed solutions should also provide that kind of flexibility.


Edit 1: Some clarifications.

My main concern is bootup time. I'd like to see all the files which are used for every boot readily accessible on that SSD. And I'd rather not have to worry about keeping the SSD in sync e.g. after package upgrades (which occur rather often on my Gentoo testing). If often-used data which I don't use during boot ends up in the cache as well, that's an added bonus. My current work project e.g. would be a nice candidate. But I'd guess 90% of the files I use every day will be used within the first 5 minutes after pressing the power button. One consequence of this aim is that approaches which wipe the cache after boot, like ZFS L2ARC apparently does, are not a feasible solution.

The answer by goldilocks moved the focus from cache insertion to cache eviction. But that doesn't change the fundamental nature of the problem. Unless the cache keeps track of how often or frequently an item is used, things might still drop out of the cache too soon. Particularly since I expect those files I use all the time to reside in the RAM cache from boot till shutdown, so they will be read from disk only once for every boot. The cache eviction policies I found for bcache and dm-cache, namely LRU and FIFO, both would evict those boot-time files in preference to other files read on that same working day. Thus my concern.

Best Answer

To my best understanding, dm-cache does what you are asking for. I could not find a definite source for this, but here the author explains that he should have called it dm-hotspot, because it tries to find "hot spots", i.e. areas of high activity and only caches those.

In the output of dmsetup status you will find two variables, namely read_promote_adjustment and write_promote_adjustment. The cache-policies file explains that

Internally the mq policy determines a promotion threshold. If the hit count of a block not in the cache goes above this threshold it gets promoted to the cache.

So by adjusting read_promote_adjustment and write_promote_adjustment you can determine what exactly you mean by frequently read/written data and once the number of reads/writes exceed this threshold, the block will be "promoted" to, that is, stored in, the cache.

Remember that this (pre-cache) metadata is usually kept in memory and only written to disk/SSD when the cache device is suspended.

Related Question