RHEL7 – Addressing Memory Fragmentation Issues

linuxmemoryrhel

I have a long-time working server application, which should work trouble-free for months. After moving an appliance to RHEL7, system started to suffer from memory fragmentation after ~2-3 days of usual load. There are a lot of "page allocation failure" messages from kernel, indicating the inability to allocate the 4 order pages in Normal Zone (while there are lots of low order pages) for almost each process. Here's an example:

kernel: [85531.010995] sh: page allocation failure: order:4, mode:0x2040d0
kernel: [85531.011000] CPU: 1 PID: 20846 Comm: sh Not tainted 3.10.0-693.el7.AV1.x86_64 #1
kernel: [85531.011002] Hardware name: VMware, Inc. VMware Virtual Platform/440BX Desktop Reference Platform, BIOS 6.00 09/21/2015
kernel: [85531.011003]  00000000002040d0 00000000d00413f4 ffff8800070ffa18 ffffffff816a3e1d
kernel: [85531.011006]  ffff8800070ffaa8 ffffffff81188d00 0000000000000000 ffff88023ffd8000
kernel: [85531.011008]  0000000000000004 00000000002040d0 ffff8800070ffaa8 00000000d00413f4
kernel: [85531.011010] Call Trace:
kernel: [85531.011018]  [<ffffffff816a3e1d>] dump_stack+0x19/0x1b
kernel: [85531.011023]  [<ffffffff81188d00>] warn_alloc_failed+0x110/0x180
kernel: [85531.011026]  [<ffffffff8169fe1a>] __alloc_pages_slowpath+0x6b6/0x724
kernel: [85531.011028]  [<ffffffff8118d275>] __alloc_pages_nodemask+0x405/0x420
kernel: [85531.011031]  [<ffffffff811d15f8>] alloc_pages_current+0x98/0x110
kernel: [85531.011035]  [<ffffffff811dc36c>] new_slab+0x2fc/0x310
kernel: [85531.011037]  [<ffffffff811ddbfc>] ___slab_alloc+0x3ac/0x4f0
kernel: [85531.011042]  [<ffffffff810850be>] ? copy_process+0x18e/0x19a0
kernel: [85531.011044]  [<ffffffff810850be>] ? copy_process+0x18e/0x19a0
kernel: [85531.011046]  [<ffffffff816a117e>] __slab_alloc+0x40/0x5c
kernel: [85531.011049]  [<ffffffff811e00cb>] kmem_cache_alloc_node+0x8b/0x200
kernel: [85531.011051]  [<ffffffff810850be>] copy_process+0x18e/0x19a0
kernel: [85531.011053]  [<ffffffff81086a81>] do_fork+0x91/0x320
kernel: [85531.011056]  [<ffffffff81086d96>] SyS_clone+0x16/0x20
kernel: [85531.011059]  [<ffffffff816b5259>] stub_clone+0x69/0x90
kernel: [85531.011061]  [<ffffffff816b4f09>] ? system_call_fastpath+0x16/0x1b
kernel: [85531.011062] Mem-Info:
kernel: [85531.011066] active_anon:1145227 inactive_anon:278512 isolated_anon:0
kernel: [85531.011066]  active_file:181319 inactive_file:185784 isolated_file:0
kernel: [85531.011066]  unevictable:2695 dirty:4333 writeback:0 unstable:0
kernel: [85531.011066]  slab_reclaimable:45889 slab_unreclaimable:54798
kernel: [85531.011066]  mapped:79471 shmem:52418 pagetables:11994 bounce:0
kernel: [85531.011066]  free:33850 free_pcp:0 free_cma:0
kernel: [85531.011069] Node 0 DMA free:15868kB min:132kB low:164kB high:196kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15992kB managed:15908kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:0kB slab_unreclaimable:8kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? yes
kernel: [85531.011073] lowmem_reserve[]: 0 2809 7800 7800
kernel: [85531.011076] Node 0 DMA32 free:53892kB min:24292kB low:30364kB high:36436kB active_anon:1622080kB inactive_anon:516652kB active_file:203244kB inactive_file:212104kB unevictable:2312kB isolated(anon):0kB isolated(file):0kB present:3129280kB managed:2878656kB mlocked:2312kB dirty:6236kB writeback:0kB mapped:115972kB shmem:79808kB slab_reclaimable:77740kB slab_unreclaimable:90500kB kernel_stack:13680kB pagetables:17624kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: [85531.011080] lowmem_reserve[]: 0 0 4990 4990
kernel: [85531.011082] Node 0 Normal free:65640kB min:43152kB low:53940kB high:64728kB active_anon:2958828kB inactive_anon:597396kB active_file:522032kB inactive_file:531032kB unevictable:8468kB isolated(anon):0kB isolated(file):0kB present:5242880kB managed:5110372kB mlocked:8464kB dirty:11096kB writeback:0kB mapped:201912kB shmem:129864kB slab_reclaimable:105816kB slab_unreclaimable:128684kB kernel_stack:19936kB pagetables:30352kB unstable:0kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
kernel: [85531.011085] lowmem_reserve[]: 0 0 0 0
kernel: [85531.011087] Node 0 DMA: 1*4kB (U) 1*8kB (U) 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15868kB
kernel: [85531.011095] Node 0 DMA32: 2946*4kB (UEM) 1995*8kB (UEM) 1241*16kB (UEM) 186*32kB (UEM) 9*64kB (U) 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 54128kB
kernel: [85531.011102] Node 0 Normal: 16005*4kB (UEM) 248*8kB (UEM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 66004kB
kernel: [85531.011108] Node 0 hugepages_total=0 hugepages_free=0 hugepages_surp=0 hugepages_size=2048kB
kernel: [85531.011109] 428930 total pagecache pages
kernel: [85531.011110] 8261 pages in swap cache
kernel: [85531.011111] Swap cache stats: add 51264, delete 43003, find 2892763/2894481
kernel: [85531.011112] Free swap  = 5078128kB
kernel: [85531.011113] Total swap = 5242876kB
kernel: [85531.011114] 2097038 pages RAM
kernel: [85531.011114] 0 pages HighMem/MovableOnly
kernel: [85531.011115] 95804 pages reserved
kernel: [85531.011116] SLUB: Unable to allocate memory on node -1 (gfp=0xd0)
kernel: [85531.011118]   cache: task_struct, object size: 45024, buffer size: 45024, default order: 4, min order: 4
kernel: [85531.011119]   node 0: slabs: 2114, objs: 2114, free: 0

Thus, I have some questions:

  1. What can affect the memory fragmentation on the system?
  2. Is it possible to determine what process is causing the fragmentation (e.g. what process is using 4 order pages most)?
  3. And, of course, how can I tune the system to avoid the memory fragmentation?

UPD:

  1. I've found out that CONFIG_COMPACTION option can help in my case, but cannot find how to enable it or check its current state. So, how can I check/enable it?

Everything worked fine before on RHEL6 and RHEL5.

# uname -a
Linux <hostname> 3.10.0-693.21.1.el7.AV1.x86_64 #1 SMP Thursday April 5, 2018 09:26:08 MDT x86_64 x86_64 x86_64 GNU/Linux

That is a VM running on ESXi 6.5


UPD1: The system is suffering from lack of order 4 pages again right now. The kernel message shows that at the moment of allocation there was enough pages in DMA32 zone, but 0 in Normal zone.

[82794.805373] Node 0 DMA: 1*4kB (U) 0*8kB 1*16kB (U) 1*32kB (U) 1*64kB (U) 1*128kB (U) 1*256kB (U) 0*512kB 1*1024kB (U) 1*2048kB (M) 3*4096kB (M) = 15860kB
[82794.805384] Node 0 DMA32: 4528*4kB (UEM) 2604*8kB (UEM) 1544*16kB (UEM) 142*32kB (UE) 19*64kB (EM) 3*128kB (U) 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69792kB
[82794.805393] Node 0 Normal: 17041*4kB (UEM) 183*8kB (UEM) 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 69628kB

Is it possible to somehow make system allocate in DMA32? I'm not an expert in that sphere, so any information is appreciated.


UPD2 I've tried to play with kernel parameters such as vm.swappiness and vm.dirty_ratio but it only postponed failures occurrence. Also, increasing the amount of memory didn't help.


UPD3 Dropping kernel caches with echo 3 > /proc/sys/vm/drop_caches helps to avoid "page allocation failures" for a while. But I understand that it is not a permanent solution since it affects performance.

Best Answer

Trying to tune kernel to avoid page allocation failures I got to this values:

vm.swappiness=10
vm.dirty_ratio=20
vm.vfs_cache_pressure=400

With that configuration frequency of failures' occurrence decreases to minimum. Also, it was found that one of the high-loaded long-running processes leaks memory, which could be the reason for fragmentation as well..

Related Question