Linux – How does Linux make sure to reclaim useless “buffers”, from caching the ext4 journal writes, before anything else (e.g. before swapping at all)

cacheext4linuxmemory

My laptop tends to run with about 256MB counted as "Buffers" – in /proc/meminfo and free -w -h – out of about 8GB of RAM.

I'm interested in memory usage. I sometimes get into trouble using too much. I'm not worried as such about the ~256MB "Buffers" usage, but I am curious.

I have worked out which software uses it, and the usage appears to be (almost entirely?) unnecessary 8-). I have two ext4 filesystems mounted, each with a journal size of 128MB. The ~256MB "Buffers" usage is basically all cached writes from ext4 journals.

I can see no need to cache the entire journal file of each filesystem. (Most of the time, only a small amount of the journal will have "live data" in it! I am not using data=journalled). I am interested in this specific unnecessary "Buffers" usage. I understand that there may be other uses and some of them may be more necessary. E.g. for all I know, it might be useful to cache the part of the journal which does currently hold live data.

When I was investigating, I noticed that "Buffers" were 30% of the physical RAM on a smaller system!

My question is, how well does Linux make sure to drop the bulk of these unnecessary "Buffers", when memory is requested for any other purpose, including for page cache? Please cite the evidence that your beliefs are based on.

I am not specifically interested in historical differences, only the behaviour of a "current" system. If you are interested, my laptop is currently running Fedora 28, kernel version 4.18.16-200.fc28.x86_64. (Or the smaller system is running Debian 9, kernel version 4.9.0-8-marvell).

Relevant details about Linux page cache

I feel more familiar with the idea of the page cache – Cached in /proc/meminfo and free -w -h – than I am with the behaviour of "Buffers".

I have recently re-read this thread: What page replacement algorithms are used in Linux kernel for OS file cache?

Accesses to uncached file pages, e.g. the reads and writes when copying an uncached file, are cached on the "inactive" LRU list. When reclaiming memory from the page cache, the kernel prefers to start with the "inactive" pages which were least recently used. It prefers these over the "active" pages, which are pages that have been accessed more than once, even if they might be older.

In particular, notice this allows you to run an arbitrarily large file copy, without swapping out all your running programs. All of the important memory pages of your running programs be on the "active" list, because the important pages will have been accessed more than once.

This is a first approximation to the complexities of Linux memory management. I use some vague words like "prefers" here, because I am not an expert on these complexities.

Ideally, I would like any unnecessary "Buffers" to be reclaimed first, before reclaiming any page cache. (In case you count "Buffers" as part of the page cache, you should instead understand that I would like any unnecessary "Buffers" to be reclaimed before any cached regular files).

So I am curious. Will the unnecessary "Buffers" be reclaimed before any of the "inactive" page cache? Or can I only say that they tend to be reclaimed before "active" page cache? Or are there more details that must be explained, before we can make any comparison with the page cache?

Best Answer

That is usually what happens, but not because there is an explicit preference, but because their access count is usually low. The memory subsystem maps disk blocks, physical memory and virtual addresses in a process address space to each other, and the only difference between a buffer or cache page and a process allocation is whether there is a process mapping.

Whenever there is memory pressure, the system evicts the memory pages where the last access was the longest ago, starting with those that have up-to-date copies on disk, then moving on to those where the disk mapping exists and can just be written, and then finally it starts creating new disk mappings by allocating swap space.

In this system, it is beneficial to create swap mappings in advance, before memory ever gets tight. When the system is otherwise idle, it can then copy some pages that haven't been accessed for a while to disk but leave them in memory as well.

This is fundamentally the same as a cache page for a disk block though, except for the mapping that could cause the page to be accessed from a process, resetting the eviction timer. If the process is sleeping and has no work to do, removing this page rather than a cache page that is actively used is usually the better choice.

A lot of caches are accessed only once or twice, so they are good candidates for eviction most of the time without requiring special status.

Related Question