Linux – is `sync + drop_caches` not dropping caches

benchmarkcachelinux-kernelperformance

I have a test case for journalctl where it spends several seconds reading from the disk. But if I try to benchmark multiple runs of the test case, I find that it's impossibly fast after the first run. Even if I try to drop caches. Why?

$ sync && echo 1 | sudo tee /proc/sys/vm/drop_caches && /usr/bin/time journalctl -b -u dev-shm.mount
1
0.01user 0.03system 0:04.50elapsed 1%CPU (0avgtext+0avgdata 30956maxresident)k
95424inputs+0outputs (424major+665minor)pagefaults 0swaps
$ sync && echo 1 | sudo tee /proc/sys/vm/drop_caches && /usr/bin/time journalctl -b -u dev-shm.mount >/dev/null
1
0.00user 0.01system 0:00.08elapsed 26%CPU (0avgtext+0avgdata 31832maxresident)k
94992inputs+0outputs (422major+445minor)pagefaults 0swaps

Interestingly time still shows it doing lots of IO through page faults (inputs). I notice that if I skip the drop_caches between runs, it shows 0 instead.

Best Answer

drop_caches only affects the kernel filesystem cache. It does not affect caches in the underlying hardware. Apparently your hardware has hundreds of megabytes of cache (94992 * 4096 ~= 400MB). Awesome!

In my case, it is because the kernel is running in a VM. So the "underlying hardware" is not a simple hard disk. This illustrates the disk settings used by virt-manager.

The option used for "caching mode" respects write flushes (using fsync()), but otherwise allows caching both writes and reads in the host kernel's page cache. The "underlying hardware" effectively includes a disk cache within the the host's RAM, potentially growing to multiple gigabytes.

libvirt / KVM calls this "writeback" caching.

I've also noticed that this speeds up rebooting the VM.

Related Solutions

Linux – Is “sync” before drop_caches necessary

Anyways I got the answer on stackoverflow which I corroborated by doing a small experiment.

"sync" only makes dirty cache to clean cache. cache is still preserved. drop_caches doesn't touch dirty caches and only drops clean caches. So to make all memory free, it is necessary to do sync first before drop_caches in case flushing daemons hasn't written the changes to disk.

My blog about this little experiment -

What are exactly O_DIRECT, O_SYNC Flags, Buffers & Cached in Linux-Storage I/O?

Stackoverflow link -

“sync” before drop_caches,is it necessary?

Linux doesn’t drop FS Caches. Instead Memory starts Swapping

The linux swapping algorithm works with the concept of "last recently used pages". Each page in virtual memory has an age associated with it. If the page is being frequently accessed then that page is supposed to be quite young in age while if a page is not being accessed, then that page becomes older. The older the pages get, the more likely they may get swapped out.

So if the kernel swaps stuff out, then it's because the age of those pages are (compared to the others) old. If there is enough physical memory for all pages, regardless of their age, nothing will be swapped.

The kernel is configured to handle his ressources, such as memory and swap, in the most efficient way, that is possible.

I don't think you should change that behavior. But, if you want, you can change the system swappiness. A swappiness setting of 0 means that the disk will be avoided unless absolutely necessary (you run out of memory).

From the Kernel Documentation about the value of swappiness:

This control is used to define how aggressive the kernel will swap memory pages. Higher values will increase agressiveness, lower values decrease the amount of swap. A value of 0 instructs the kernel not to initiate swap until the amount of free and file-backed pages is less than the high water mark in a zone.

In the linux kernel source code the file vmscan.c handles the swappiness value. Here is the interessting part:

2018         /*
2019          * With swappiness at 100, anonymous and file have the same priority.
2020          * This scanning priority is essentially the inverse of IO cost.
2021          */
2022         anon_prio = swappiness;
2023         file_prio = 200 - anon_prio;

Anonymous pages are memory mappings with no file or device backing it. This is how programs allocate memory from the operating system for use by things like the stack and heap.
File pages mirror the contents of an existing file.

As you see in the source code snippet above, the priority to swap file pages is (with a default value of 60) higher than to swap anonymous pages. But, if set to 100, both values have the same priority. If set to 0 the priority difference is as big as possible.

You can set the swappiness as follows:

echo n >/proc/sys/vm/swappiness

... where n is the value from 0-100.

Best Answer

Related Solutions

Linux – Is “sync” before drop_caches necessary

Linux doesn’t drop FS Caches. Instead Memory starts Swapping

Related Question