Linux Kernel – How to Restrict Size of Buffer Cache

bufferlinux-kernel

Is there a way to tell the Linux kernel to only use a certain percentage of memory for the buffer cache? I know /proc/sys/vm/drop_caches can be used to clear the cache temporarily, but is there any permanent setting that prevents it from growing to more than e.g. 50% of main memory?

The reason I want to do this, is that I have a server running a Ceph OSD which constantly serves data from disk and manages to use up the entire physical memory as buffer cache within a few hours. At the same time, I need to run applications that will allocate a large amount (several 10s of GB) of physical memory. Contrary to popular belief (see the advice given on nearly all questions concerning the buffer cache), the automatic freeing up the memory by discarding clean cache entries is not instantaneous: starting my application can take up to a minute when the buffer cache is full (*), while after clearing the cache (using echo 3 > /proc/sys/vm/drop_caches) the same application starts nearly instantaneously.

(*) During this minute of startup time, the application is faulting in new memory but spends 100% of its time in the kernel, according to Vtune in a function called pageblock_pfn_to_page. This function seems to be related to memory compaction needed to find huge pages, which leads me to believe that actually fragmentation is the problem.

Best Answer

If you do not want an absolute limit but just pressure the kernel to flush out the buffers faster, you should look at vm.vfs_cache_pressure

This variable controls the tendency of the kernel to reclaim the memory which is used for caching of VFS caches, versus pagecache and swap. Increasing this value increases the rate at which VFS caches are reclaimed.

Ranges from 0 to 200. Move it towards 200 for higher pressure. Default is set at 100. You can also analyze your memory usage using the slabtop command. In your case, the dentry and *_inode_cache values must be high.

If you want an absolute limit, you should look up cgroups. Place the Ceph OSD server within a cgroup and limit the maximum memory it can use by setting the memory.limit_in_bytes parameter for the cgroup.

memory.memsw.limit_in_bytes sets the maximum amount for the sum of memory and swap usage. If no units are specified, the value is interpreted as bytes. However, it is possible to use suffixes to represent larger units — k or K for kilobytes, m or M for Megabytes, and g or G for Gigabytes.

References:

[1] - GlusterFS Linux Kernel Tuning

[2] - RHEL 6 Resource Management Guide

Related Solutions

Buffer size for capturing packets in kernel space

Tcpdump has the option -B to set the capture buffer size. The value is then passed to libpcap (library used by tcpdump to do the actual packet capturing) via pcap_set_buffer_size() function. Tcpdump manpage does not specify in what units the buffer size is specified with -B, but from the source it seems that it is KiB.

manual page of pcap_set_buffer_size() does not specify default buffer size (which is used if this function is not called), but again, from the libpcap source, this seems to be 2 MiB, at least on linux (but is most likely system dependent).

With regard to packet buffering and dropping, you should also pay attention to setting snaplen (-s) parameter accordingly. man tcpdump:

-s     Snarf  snaplen bytes of data from each packet rather than the
default of 65535 bytes.  Packets truncated because of a limited snapshot
are indicated in the output with ``[|proto]'', where proto is the name of
the protocol level at which the truncation has occurred. Note that  taking
larger  snapshots both increases the amount of time it  takes  to
process packets and, effectively, decreases the amount of packet buffering.
This may cause packets to be lost. You should limit snaplen to the
smallest number that will capture the protocol information you're
interested in. Setting snaplen to 0 sets it to the default of 65535, for
back-wards compatibility with recent older versions of tcpdump.

This means that with fixed buffer size, you can increase the number of packets that fit into the buffer (and thus not being dropped) by decreasing the snaplen size.

What should be the buffer size for the sort command

You don't specify the OS and the sort implementation; I assume you mean GNU sort. You also don't say how long "a lot of time" is, or how long you expect it to take. Most important, you don't mention the I/O subsystem capability, which will be the governing factor.

An ordinary SATA drive delivers ~150 MB/s. At that rate your 150 GB file will take 1000 seconds just to read, about 15 minutes. Try $ time cat filename >/dev/null to see. If ~15 minutes (or whatever time cat shows) is OK, you might be able to get sort(1) to work in about 3X the time, because the output has to be written, too.

Your best bet for speedup would seem to be --parallel, because your data fit in memory and you have spare processors. According to the info page, --buffer-size won't matter, because

... this option affects only the initial buffer size. The buffer grows beyond SIZE if `sort' encounters input lines larger than SIZE.

whereas a quick search indicates GNU uses merge sort, which is amenable to parallelization.

If you really want to know how GNU sort determines buffer sizes and what algorithm it uses for parallel sorting, the coreutils source code and accompanying documentation is readily available.

But if I were you I wouldn't bother. Whatever you're doing with master_matrix_unsorted.csv, sort(1) is surely not up to the task.

First, a CSV file will, one day, trip you up because the CSV syntax is far beyond sort's ken. Second, it is the slowest possible way, because sort(1) is forced to sort entire rows (of indeterminate length), not just the second column. Third, when you're done, what will you have? A sorted CSV file. Is that really better? Why does the order matter so very much?

Sorting sounds like one step along the way toward a goal that likely includes some kind of computation on the data, which computation will require numbers in binary format. If that's the case, you might as well get the CSV file into a more tractable, computable, binary format first in, say, a DBMS. You may find that sorting it turns out to be unnecessary to the ultimate goal.

Best Answer

Related Solutions

Buffer size for capturing packets in kernel space

What should be the buffer size for the sort command

Related Question