Linux Kernel – How to Restrict Size of Buffer Cache

bufferlinux-kernel

Is there a way to tell the Linux kernel to only use a certain percentage of memory for the buffer cache? I know /proc/sys/vm/drop_caches can be used to clear the cache temporarily, but is there any permanent setting that prevents it from growing to more than e.g. 50% of main memory?

The reason I want to do this, is that I have a server running a Ceph OSD which constantly serves data from disk and manages to use up the entire physical memory as buffer cache within a few hours. At the same time, I need to run applications that will allocate a large amount (several 10s of GB) of physical memory. Contrary to popular belief (see the advice given on nearly all questions concerning the buffer cache), the automatic freeing up the memory by discarding clean cache entries is not instantaneous: starting my application can take up to a minute when the buffer cache is full (*), while after clearing the cache (using echo 3 > /proc/sys/vm/drop_caches) the same application starts nearly instantaneously.

(*) During this minute of startup time, the application is faulting in new memory but spends 100% of its time in the kernel, according to Vtune in a function called pageblock_pfn_to_page. This function seems to be related to memory compaction needed to find huge pages, which leads me to believe that actually fragmentation is the problem.

Best Answer

If you do not want an absolute limit but just pressure the kernel to flush out the buffers faster, you should look at vm.vfs_cache_pressure

This variable controls the tendency of the kernel to reclaim the memory which is used for caching of VFS caches, versus pagecache and swap. Increasing this value increases the rate at which VFS caches are reclaimed.

Ranges from 0 to 200. Move it towards 200 for higher pressure. Default is set at 100. You can also analyze your memory usage using the slabtop command. In your case, the dentry and *_inode_cache values must be high.

If you want an absolute limit, you should look up cgroups. Place the Ceph OSD server within a cgroup and limit the maximum memory it can use by setting the memory.limit_in_bytes parameter for the cgroup.

memory.memsw.limit_in_bytes sets the maximum amount for the sum of memory and swap usage. If no units are specified, the value is interpreted as bytes. However, it is possible to use suffixes to represent larger units — k or K for kilobytes, m or M for Megabytes, and g or G for Gigabytes.

References:

[1] - GlusterFS Linux Kernel Tuning

[2] - RHEL 6 Resource Management Guide

Related Question