Linux – Can kswapd be active if free memory well exceeds pages_high watermark

linux-kernelswapvirtual-memory

I'm struggling to understand the inner workings of page frame reclamation algorithm in RHEL 6.

More specifically, I want to understand why we are seeing non zero values of si/so in vmstat and other signs of swapping when free memory doesn't go anywhere below pages_low (or even pages_high).

From vmstat:

  procs -----------memory---------- ---swap-- -----io---- --system-- ----cpu-----
 r     b swpd        free          buff       cache        si   so   bi      bo      in          cs        us sy id wa st
 13 4 2476036 1533508 486264 10396996 18 22 9674 2790 59364 114558 7 8 81 4 0

i.e. there's 1533508 kilobytes of free memory on the system.

From /proc/zoneinfo

Node 0, zone Normal
…
 min 130364
 low 162955
 high 195546 

The fact that we see non-zero swap-in and swap-out activity (si>0, so>0) while free memory (equivalent of about 375k pages) is well above both low and high memory thresholds seems to be at odds with how swapping activity is described in documentation and literature.

E.g. “Understanding Linux Virtual Memory” by Mel Gorman:

“Historically, kswapd used to wake up every 10 seconds but now it is
only woken by the physical page allocator when the pages_low number of
free pages in a zone is reached”

Later on the book offers one possible explanation to what we are seeing:

“Under extreme memory pressure, processes will do the work of kswapd
synchronously by calling balance_classzone() which calls
try_to_free_pages_zone()”

i.e. when memory allocation requests fail or are slow, processes can initiate zone balancing themselves. However, it’s not clear whether this can account for swapping as try_to_free_pages_zone seems to be focused around shrinking various caches.

Also, we often see kswapd in top when observing signs of swapping, which also seems to be at odds with the direct reclamation theory.

Is there something I'm missing here?

Update I specifically checked ExaWatcher ps output taken during a period of swapping and I can see kswapd0 process in the "R" state during these times. I.e. this rules out the direct reclamation scenario.

Best regards,
Nikolai

Best Answer

I was able to find at least one scenario that can lead to swapping pages out of main memory while free memory is well above any zone watermarks. The scenario has to do with zone compaction, one of algorithms for VM defragmentation.

The basic idea behind the process is to move pages around to create large continuous chunks of virtual addresses. "Moving around" refers to updating pages' PTEs, not physically moving them.

The compaction algorithm runs two scanners from opposite ends of a zone, working their way towards each other. One scanner searches for pages to move, the other one for free pages where they could be moved to, and eventually they are supposed to meet somewhere in the middle.

The thing is, during zone compaction, it is possible to find a page that cannot be moved, but yet can be reclaimed. When this happens, the algorithm may try to reclaim it by swapping it out.

The important thing here is that zone compaction is not triggered by any watermarks. Rather, it happens whenever a high order allocation fails, i.e. it can happen when there's still plenty of free memory left, if this memory is fragmented enough.

Related Question