Linux Swap – System Becomes Unresponsive Despite Free RAM

out of memoryramswapx11

I am experiencing a weird issue lately:

Sometimes (I cannot reproduce it on purpose), my system is using all its swap, despite there being more than enough free RAM. If this happens, the systems then becomes unresponsive for a couple of minutes, then the OOM killer kills either a "random" process which does not help much, or the X server.
If it kills a "random" process, the system does not become responsive (there is still no swap but much free RAM); if it kills X, the swap is freed and the system becomes responsive again.

Output of free when it happens:

$ free -htl
              total        used        free      shared  buff/cache   available
Mem:           7.6G        1.4G         60M        5.7G        6.1G        257M
Low:           7.6G        7.5G         60M
High:            0B          0B          0B
Swap:          3.9G        3.9G          0B
Total:          11G        5.4G         60M

uname -a:

Linux fedora 4.4.7-300.fc23.x86_64 #1 SMP Wed Apr 13 02:52:52 UTC 2016 x86_64 x86_64 x86_64 GNU/Linux


cat /proc/sys/vm/swappiness 

Relevant section in dmesg:


$ df -h -t tmpfs
Filesystem      Size  Used Avail Use% Mounted on
tmpfs           3.8G  1.5M  3.8G   1% /dev/shm
tmpfs           3.8G  1.7M  3.8G   1% /run
tmpfs           3.8G     0  3.8G   0% /sys/fs/cgroup
tmpfs           3.8G  452K  3.8G   1% /tmp
tmpfs           776M   16K  776M   1% /run/user/42
tmpfs           776M   32K  776M   1% /run/user/1000


top -o SHR -n 1
Tasks: 231 total,   1 running, 230 sleeping,   0 stopped,   0 zombie
%Cpu(s):  8.5 us,  3.0 sy,  0.3 ni, 86.9 id,  1.3 wa,  0.0 hi,  0.0 si,  0.0 st
KiB Mem :  7943020 total,   485368 free,   971096 used,  6486556 buff/cache
KiB Swap:  4095996 total,  1698992 free,  2397004 used.   989768 avail Mem 

  PID USER      PR  NI    VIRT    RES    SHR S  %CPU %MEM     TIME+ COMMAND                                             
 2066 mkamlei+  20   0 8342764 163908 145208 S   0.0  2.1   0:59.62 Xorg                                                
 2306 mkamlei+  20   0 1892816 138536  27168 S   0.0  1.7   1:25.47 gnome-shell                                         
 3118 mkamlei+  20   0  596392  21084  13152 S   0.0  0.3   0:04.86 gnome-terminal-                                     
 1646 gdm       20   0 1502632  60324  12976 S   0.0  0.8   0:01.91 gnome-shell                                         
 2269 mkamlei+  20   0 1322592  22440   8124 S   0.0  0.3   0:00.87 gnome-settings-                                     
  486 root      20   0   47048   8352   7656 S   0.0  0.1   0:00.80 systemd-journal                                     
 2277 mkamlei+   9 -11  570512  10080   6644 S   0.0  0.1   0:15.33 pulseaudio                                          
 2581 mkamlei+  20   0  525424  19272   5796 S   0.0  0.2   0:00.37 redshift-gtk                                        
 1036 root      20   0  619016   9204   5408 S   0.0  0.1   0:01.70 NetworkManager                                      
 1599 gdm       20   0 1035672  11820   5120 S   0.0  0.1   0:00.28 gnome-settings-                                     
 2386 mkamlei+  20   0  850856  24948   4944 S   0.0  0.3   0:05.84 goa-daemon                                          
 2597 mkamlei+  20   0 1138200  13104   4596 S   0.0  0.2   0:00.28 evolution-alarm                                     
 2369 mkamlei+  20   0 1133908  16472   4560 S   0.0  0.2   0:00.49 evolution-sourc                                     
 2529 mkamlei+  20   0  780088  54080   4380 S   0.0  0.7   0:01.14 gnome-software                                      
 2821 mkamlei+  20   0 1357820  44320   4308 S   0.0  0.6   0:00.23 evolution-calen                                     
 2588 mkamlei+  20   0 1671848  55744   4300 S   0.0  0.7   0:00.49 evolution-calen                                     
 2525 mkamlei+  20   0  613512   8928   4188 S   0.0  0.1   0:00.19 abrt-applet                                         


[mkamleithner@fedora ~]$ ipcs -m -t

------ Shared Memory Attach/Detach/Change Times --------
shmid      owner      attached             detached             changed             
294912     mkamleithn Apr 30 20:29:16      Not set              Apr 30 20:29:16     
393217     mkamleithn Apr 30 20:29:19      Apr 30 20:29:19      Apr 30 20:29:17     
491522     mkamleithn Apr 30 20:42:21      Apr 30 20:42:21      Apr 30 20:29:18     
524291     mkamleithn Apr 30 20:38:10      Apr 30 20:38:10      Apr 30 20:29:18     
786436     mkamleithn Apr 30 20:38:12      Not set              Apr 30 20:38:12     

[mkamleithner@fedora ~]$ ipcs

------ Message Queues --------
key        msqid      owner      perms      used-bytes   messages    

------ Shared Memory Segments --------
key        shmid      owner      perms      bytes      nattch     status      
0x00000000 294912     mkamleithn 600        524288     2          dest         
0x00000000 393217     mkamleithn 600        2576       2          dest         
0x00000000 491522     mkamleithn 600        4194304    2          dest         
0x00000000 524291     mkamleithn 600        524288     2          dest         
0x00000000 786436     mkamleithn 600        4194304    2          dest         

------ Semaphore Arrays --------
key        semid      owner      perms      nsems     

[mkamleithner@fedora ~]$ sudo grep 786436 /proc/*/maps
/proc/2084/maps:7ff4a56cc000-7ff4a5acc000 rw-s 00000000 00:05 786436                     /SYSV00000000 (deleted)
/proc/3984/maps:7f4574d00000-7f4575100000 rw-s 00000000 00:05 786436                     /SYSV00000000 (deleted)
[mkamleithner@fedora ~]$ sudo grep 524291 /proc/*/maps
/proc/2084/maps:7ff4a4593000-7ff4a4613000 rw-s 00000000 00:05 524291                     /SYSV00000000 (deleted)
/proc/2321/maps:7fa9b8a67000-7fa9b8ae7000 rw-s 00000000 00:05 524291                     /SYSV00000000 (deleted)
[mkamleithner@fedora ~]$ sudo grep 491522 /proc/*/maps
/proc/2084/maps:7ff4a4ad3000-7ff4a4ed3000 rw-s 00000000 00:05 491522                     /SYSV00000000 (deleted)
/proc/2816/maps:7f2763ba1000-7f2763fa1000 rw-s 00000000 00:05 491522                     /SYSV00000000 (deleted)
[mkamleithner@fedora ~]$ sudo grep 393217 /proc/*/maps
/proc/2084/maps:7ff4b1a60000-7ff4b1a61000 rw-s 00000000 00:05 393217                     /SYSV00000000 (deleted)
/proc/2631/maps:7fb89be79000-7fb89be7a000 rw-s 00000000 00:05 393217                     /SYSV00000000 (deleted)
[mkamleithner@fedora ~]$ sudo grep 294912 /proc/*/maps
/proc/2084/maps:7ff4a5510000-7ff4a5590000 rw-s 00000000 00:05 294912                     /SYSV00000000 (deleted)
/proc/2582/maps:7f7902dd3000-7f7902e53000 rw-s 00000000 00:05 294912                     /SYSV00000000 (deleted)

getting the process names:

[mkamleithner@fedora ~]$ ps aux | grep 2084
mkamlei+  2084  5.1  2.0 8149580 159272 tty2   Sl+  20:29   1:10 /usr/libexec/Xorg vt2 -displayfd 3 -auth /run/user/1000/gdm/Xauthority -nolisten tcp -background none -noreset -keeptty -verbose 3
mkamlei+  5261  0.0  0.0 118476  2208 pts/0    S+   20:52   0:00 grep --color=auto 2084
[mkamleithner@fedora ~]$ ps aux | grep 3984
mkamlei+  3984 11.4  3.6 1355100 293240 tty2   Sl+  20:38   1:38 /usr/lib64/firefox/firefox
mkamlei+  5297  0.0  0.0 118472  2232 pts/0    S+   20:52   0:00 grep --color=auto 3984

Should I also post the results for the other shmids? I don't really know how to interpret the output.

How can I fix this?

Edit: Starting the game "Papers, Please" always seems to trigger this problem after some time. It also happens sometimes when this game is not started, though.

Edit2: Seems to be an X issue. On wayland this does not happen. Might be due to custom settings in xorg.conf.

Final Edit: For anyone experiencing the same problem: I was using DRI 2. Switching to DRI 3 also fixes the problem. this is my relevant section in the xorg.conf:

Section "Device"
    Identifier  "Intel Graphics"
    Driver      "intel"
    Option      "AccelMethod"     "sna" # 
    Option      "Backlight"       "intel_backlight"
    BusID       "PCI:0:2:0"
    Option      "DRI"             "3" #here
    Option      "TearFree"        "true"

The relevant file on my system is in /usr/share/X11/xorg.conf.d/ .

Best Answer

shared Memory used (mostly) by tmpfs (Shmem in /proc/meminfo, available on kernels 2.6.32, displayed as zero if not available)>

So the manpage definition of Shared is not as helpful as it could be :(. If the tmpfs use does not reflect this high value of Shared, then the value must represent some process(es) "who did mmap() with MAP_SHARED|MAP_ANONYMOUS" (or System V shared memory).

6G of shared memory on an 8G system is still a lot. Seriously, you don't want that, at least not on a desktop.

It's weird that it seems to contribute to "buff/cache" as well. But I did a quick test with python and that's just how it works.

To show the processes with the most shared memory, use top -o SHR -n 1.

System V shared memory

Finally it's possible you have some horrible legacy software that uses system V shared memory segments. If they get leaked, they won't show up in top :(.

You can list them with ipcs -m -t. Hopefully the most recently created one is still in use. Take the shmid number and e.g.

$ ipcs -m -t

------ Shared Memory Attach/Detach/Change Times --------
shmid      owner      attached             detached             changed             
3538944    alan       Apr 30 20:35:15      Apr 30 20:35:15      Apr 30 16:07:41     
3145729    alan       Apr 30 20:35:15      Apr 30 20:35:15      Apr 30 15:04:09     
4587522    alan       Apr 30 20:37:38      Not set              Apr 30 20:37:38     

# sudo grep 4587522 /proc/*/maps

-> then the numbers shown in the /proc paths are the pid of the processes that use the SHM. (So you could e.g. grep the output of ps for that pid number).

Apparent contradictions

  1. Xorg has 8G mapped. Even though you don't have separate video card RAM. It only has 150M resident. It's not that the rest is swapped out, because you don't have enough swap space.

  2. The SHM segments shown by ipcs are all attached to two processes. So none of them have leaked, and they should all show up in the SHR column of top (double-counted even). It's ok if the number of pages used is less than the size of the memory segment, that just means there are pages that haven't been used. But free says we have 6GB of allocated shared memory to account for, and we can't find that.

