Suppose a program asks for some memory, but there is not enough free memory left. There are several different ways Linux could respond. One response is to select some other used memory, which has not been accessed recently, and move this inactive memory to swap.
However, I see many articles and comments that go beyond this. They say even when there is a large amount of free memory, Linux will sometimes decide to write inactive memory to swap. Writing to swap in advance means that when we eventually want to use this memory, we do not have to wait for a disk write. They say this is a deliberate strategy to optimize performance.
Are they right? Or is it a myth? Cite your source(s).
Please understand this question using the following definitions:
- swap
- free memory – the "free" memory displayed by the free command. This is the
MemFree
value from/proc/meminfo
./proc/meminfo
is a virtual text file provided by the kernel. See proc(5), or RHEL docs. - even when there is a large amount of free memory – for the purpose of argument, imagine there is more than 10% free memory.
References
Here are some search terms: linux "opportunistic swapping" OR (swap "when the system has nothing better to do" OR "when it has nothing better to do" OR "when the system is idle" OR "during idle time")
In the second-highest result on Google, a StackExchange user asks "Why use swap when there is more than enough free space in RAM?", and copies the results of the free
command showing about 20% free memory. In response to this specific question, I see this answer is highly voted:
Linux starts swapping before the RAM is filled up. This is done to
improve performance and responsiveness:
Performance is increased because sometimes RAM is better used for disk cache than to store program memory. So it's better to swap out a
program that's been inactive for a while, and instead keep often-used
files in cache.Responsiveness is improved by swapping pages out when the system is idle, rather than when the memory is full and some program is running
and requesting more RAM to complete a task.Swapping does slow the system down, of course — but the alternative to
swapping isn't not swapping, it's having more RAM or using less RAM.
The first result on Google has been marked as a duplicate of the question above :-). In this case, the asker copied details showing 7GB MemFree
, out of 16GB. The question has an accepted and upvoted answer of its own:
Swapping only when there is no free memory is only the case if you set
swappiness
to 0. Otherwise, during idle time, the kernel will swap memory. In doing this the data is not removed from memory, but rather a copy is made in the swap partition.This means that, should the situation arise that memory is depleted, it does not have to write to disk then and there. In this case the kernel can just overwrite the memory pages which have already been swapped, for which it knows that it has a copy of the data.
The
swappiness
parameter basically just controls how much it does this.
The other quote does not explicitly claim the swapped data is retained in memory as well. But it seems like you would prefer that approach, if you are swapping even at times when you have 20% free memory, and the reason you are doing so is to improve performance.
As far as I know, Linux does support keeping a copy of the same data in both main memory and swap space.
I also noticed the common claim that "opportunistic swapping" happens "during idle time". I understand it's supposed to help reassure me that this feature is generally good for performance. I don't include this in my definition above, because I think it already has enough details to make a nice clear question. I don't want to make this more complicated than it needs to be.
Original motivation
atop shows `swout` (swapping) when I have gigabytes of free memory. Why?
There are a couple of reports like this, of Linux writing to swap when there is plenty of free memory. "Opportunistic swapping" might explain these reports. At the same time, at least one alternative cause was suggested. As a first step in looking at possible causes: Does Linux ever perform "opportunistic swapping" as defined above?
In the example I reported, the question has now been answered. The cause was not opportunistic swapping.
Best Answer
Linux does not do "opportunistic swapping" as defined in this question.
The following primary references do not mention the concept at all:
More specifically:
Based on the above, we would not expect any swapping when the number of free pages is higher than the "high watermark".
Secondly, this tells us the purpose of
kswapd
is to make more free pages.When
kswapd
writes a memory page to swap, it immediately frees the memory page. kswapd does not keep a copy of the swapped page in memory.Linux 2.6 uses the "rmap" to free the page. In Linux 2.4, the story was more complex. When a page was shared by multiple processes, kswapd was not able to free it immediately. This is ancient history. All of the linked posts are about Linux 2.6 or above.
This quote describes a special case: if you configure the
swappiness
value to be0
. In this case, we should additionally not expect any swapping until the number of cache pages has fallen to the high watermark. In other words, the kernel will try to discard almost all file cache before it starts swapping. (This might cause massive slowdowns. You need to have some file cache! The file cache is used to hold the code of all your running programs :-)What are the watermarks?
The above quotes raise the question: How large are the "watermark" memory reservations on my system? Answer: on a "small" system, the default zone watermarks might be as high as 3% of memory. This is due to the calculation of the "min" watermark. On larger systems the watermarks will be a smaller proportion, approaching 0.3% of memory.
So if the question is about a system with more than 10% free memory, the exact details of this watermark logic are not significant.
The watermarks for each individual "zone" are shown in
/proc/zoneinfo
, as documented in proc(5). An extract from my zoneinfo:The current "watermarks" are
min
,low
, andhigh
. If a program ever asks for enough memory to reducefree
belowmin
, the program enters "direct reclaim". The program is made to wait while the kernel frees up memory.We want to avoid direct reclaim if possible. So if
free
would dip below thelow
watermark, the kernel wakeskswapd
.kswapd
frees memory by swapping and/or dropping caches, untilfree
is abovehigh
again.Additional qualification:
kswapd
will also run to protect the full lowmem_reserve amount, for kernel lowmem and DMA usage. The default lowmem_reserve is about 1/256 of the first 4GiB of RAM (DMA32 zone), so it is usually around 16MiB.Linux code commits
Linux code
It is sometimes claimed that changing
swappiness
to0
will effectively disable "opportunistic swapping". This provides an interesting avenue of investigation. If there is something called "opportunistic swapping", and it can be tuned by swappiness, then we could chase it down by finding all the call-chains that readvm_swappiness
. Note we can reduce our search space by assumingCONFIG_MEMCG
is not set (i.e. "memory cgroups" are disabled). The call chain goes:shrink_node_memcg
is commented "This is a basic per-node page freer. Used by both kswapd and direct reclaim". I.e. this function increases the number of free pages. It is not trying to duplicate pages to swap so they can be freed at a much later time. But even if we discount that:The above chain is called from three different functions, shown below. As expected, we can divide the call-sites into direct reclaim v.s. kswapd. It would not make sense to perform "opportunistic swapping" in direct reclaim.
So, presumably the claim is that kswapd is woken up somehow, even when all memory allocations are being satisfied immediately from free memory. I looked through the uses of
wake_up_interruptible(&pgdat->kswapd_wait)
, and I am not seeing any wakeups like this.