Linux – Is it expected to have a system frozen for more than an hour because of intensive swapping

freezelinuxswap

Following on from this question: What can make Linux so unresponsive?

I get forced to wait for a minute or more to get rid of the bloat, sometimes it stays unresponsive for twelve minutes, and hence I get frustrated. The fact the OS being not able to well-handle multitasking, tends to reflect an absolutely weird and unacceptable behavior.

[…]

These are the results of resource usage per htop:

1  [|||||                    14.1%]   Tasks: 286, 1497 thr; 2 running
2  [|||||                    13.2%]   Load average: 3.00 4.97 6.09 
3  [|||||                    12.5%]   Uptime: 3 days, 16:12:35
4  [|||                       9.3%]
Mem[|||||||||||||||||||5.09G/7.61G]
Swp[|||||||||||||||||||3.68G/4.65G]

[…]

I also have a spinny disk and 8GB RAM. I have had problems with a couple of pieces of software with memory leaks. I.e. their memory usage keeps growing over time and never shrinks, so the only way to control it would have been to stop the software and then restart it. Based on the experiences I had during this, I am not very surprised to hear delays over ten minutes, if you are generating over 3GB of swap.

I would like to ask you to clarify this. "Philippos" in his comment says "Frozen for more than one hour? I saw that only once, on a system swapping to a dying hard disk". Is it expected to have a system frozen for more than an hour because of intensive swapping?

Best Answer

You're asking me to guess and put an upper bound on it.

I can try to share my experience. I won't say you shouldn't ask for high standards, I just want to be realistic about the standard that Linux currently meets :-).

With your amount of RAM, swap, and type of storage. If the RAM usage is due to multiple interactive apps. Only one of them is being interacted with. You hadn't left an operation running in any of the other apps. And the other apps don't have a large number of tabs with animated advertisements in them :). In that case, I think you make a good point! My current intuition says it would be unusual to take longer than 10 minutes, for the system to clear up and be workable.

Do I think you should ever wait 10 minutes, hoping the mouse cursor will start working and the disk light will calm down again?

Not exactly. If the hope is to wait for the GUI to settle and be usable? If it even takes as long as 2 minutes for the GUI to be usable, and my goal isn't just to start closing some windows of the current foreground app? You'd expect this delay to keep happening again. That's clearly too long.

But secondly, there are various possible problems this could be. For example if half the problem is a concurrent operation in a second program, that I'm not aware of, then the system working set could be larger than RAM. In that case, yes, the system could thrash for more hours than you can count. So you wouldn't want to wait.

If I was trying to get insights about what was going wrong, my maximum timeout might be 15 minutes. This would be the overall timeout for getting data out of a sequence such as:

  • ctrl+alt+f6
    • If that takes too long to switch to a text terminal, then use alt+sysrq+R and try again. Beware that if you switch back to the GUI and ever press ctrl+c, the entire GUI will be killed.
  • log in
  • sudo tmux - text window manager. Now I can run multiple commands as root, and switch between them, without getting login or sudo delays.
  • atop -R - did I mention I love atop?
  • iotop - horrible delays are usually about I/O. This is a nice tool that does one thing :-).
  • journalctl --since=-1hour -f
  • ...
Related Question