Machine freezes once it hits swap space under heavy load

freezeswaptroubleshooting

I encountered several crashes of my machine. Meanwhile, I can reproduce it when I start a program that fills up all memory. Once the system starts writing to the swap file, the system freezes and I have to reboot.

In the journal, I see no useful log information before the crash, for instance:

Mar 23 19:12:01 classen systemd[1]: Starting Cleanup of Temporary Directories...
Mar 23 19:12:01 classen systemd[1]: Started Cleanup of Temporary Directories.
Mar 23 19:12:08 classen wpa_supplicant[757]: wlp3s0: WPA: Group rekeying completed with ...
-- Reboot --
Mar 23 19:17:03 classen systemd-journald[380]: Runtime journal (/run/log/journal/) is 8.0M, max 796.6M, 788.6M free.

Actually, I don't know to troubleshoot the problem. I hope that someone has seen something similar, and can point me in the right direction. The strange thing is that after working for a while, my system is able to swap to some degree (at least, top showed that some of the swap space was occupied). The freezes happen only under heavy load to the swap file.


Here is my setup:

$ lsblk

NAME                    MAJ:MIN RM   SIZE RO TYPE  MOUNTPOINT
sda                       8:0    0 238.5G  0 disk  
├─sda1                    8:1    0   512M  0 part  /boot
└─sda2                    8:2    0   238G  0 part  
  └─MyStorage           254:0    0   238G  0 crypt 
    ├─MyStorage-swapvol 254:1    0    16G  0 lvm   [SWAP]
    └─MyStorage-rootvol 254:2    0   222G  0 lvm   /
sdb                       8:16   0 931.5G  0 disk  
└─sdb1                    8:17   0 931.5G  0 part  
sr0                      11:0    1  1024M  0 rom   

Relevant part of /etc/fstab:

/dev/mapper/MyStorage-rootvol   /    btrfs   rw,noatime,ssd,autodefrag,compress=lzo,space_cache      0 0
/dev/mapper/MyStorage-swapvol none   swap    defaults        0 0

UUID=63A7-3F81          /boot        vfat    rw,relatime,fmask=0022,dmask=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro    0 2

$ swapon --summary

Filename                Type        Size    Used    Priority
/dev/dm-1                               partition   16777212    0   -1

I am running Arch Linux with a 4.4.5 kernel:

$ uname -a
Linux classen 4.4.5-1-ARCH #1 SMP PREEMPT Thu Mar 10 07:38:19 CET 2016 x86_64 GNU/Linux

hooks in /etc/mkinitcpio.conf:

HOOKS="base udev autodetect modconf block encrypt lvm2 resume filesystems keyboard fsck"

Best Answer

After some experiment, I can confirm that it was actually thrashing in combination with a huge swap partition (16 GB).

Thanks for the comments, Otheus and cas, you had the right intuition. I underestimated the effect. Maybe because previous machines that I used had smaller swap spaces (in comparison to the memory), so eventually the memory hungry process was killed.

As some safety measures, I will reduce the maximum swap space on my system. I also defined a per-process limit to guard against a single process blowing up the memory:

# limit memory usage to 10G per process
ulimit -Sv 10000000

Tools like vmstat 1 can help to analyze the problem.

Related Question