Linux Freeze – What Causes Unresponsiveness When Browsing Certain Websites?

freezelinux

I'm using Linux 4.15, and this happens to me many times when I browse Google, Facebook or any other resource-hungry website – The whole OS becomes unresponsive, frozen and useless. The only thing I see it to be working is the disk (main system partition formatted as ext4), which is massively in use (I/O throttling).

I get forced to wait for a minute or more to get rid of the bloat, sometimes it stays unresponsive for twelve minutes, and hence I get frustrated. The fact the OS being not able to well-handle multitasking, tends to reflect an absolutely weird and unacceptable behavior.

Not only this happens with Firefox, but with any javascript-interpreter application including Microsoft VSCode or angular-cli (ng serve command) as well as any other resource-hungry thread of execution – such as the case of plantuml when generating a very large graph from a very complex UML diagram.

Today, the OS becomes totally unmanageable, after launching a data recovery software for an external HDD (over ext4 partition) that got lately unplugged from a bad USB port by little move.

I'm not able to tell the root cause behind such buggy behavior

I have many tabs opened in the browser, and 94% OS-partition usage as per df output:

Filesystem     1K-blocks      Used Available Use% Mounted on
udev             3964160         0   3964160   0% /dev
tmpfs             798164      3192    794972   1% /run
/dev/sda5      173466400 153224316  11407424  94% /
tmpfs            3990820     62936   3927884   2% /dev/shm
tmpfs               5120         4      5116   1% /run/lock
tmpfs            3990820         0   3990820   0% /sys/fs/cgroup
/dev/loop5           128       128         0 100% /snap/anbox-installer/24
/dev/loop2           128       128         0 100% /snap/anbox-installer/17
/dev/loop4        223616    223616         0 100% /snap/kde-frameworks-5/26
/dev/loop3         90624     90624         0 100% /snap/core/7169
/dev/loop7        223616    223616         0 100% /snap/kde-frameworks-5/25
/dev/loop8         90624     90624         0 100% /snap/core/7270
/dev/loop0         87552     87552         0 100% /snap/qownnotes/2160
/dev/loop1        241664    241664         0 100% /snap/kde-frameworks-5/27
tmpfs             798164         0    798164   0% /run/user/0
tmpfs             798164        32    798132   1% /run/user/1000
/dev/loop9         87552     87552         0 100% /snap/qownnotes/2176
/dev/sda3      188669948 187132488   1537460 100% /media/kais/DATA
/dev/sdb1       15142960   2091904  13051056  14% /media/kais/STORE N GO

As hardware, I'm using:

  1. Intel Core i3 v2348M as per lscpu:

    Architecture:        x86_64
    CPU op-mode(s):      32-bit, 64-bit
    Byte Order:          Little Endian
    Address sizes:       36 bits physical, 48 bits virtual
    CPU(s):              4
    On-line CPU(s) list: 0-3
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    NUMA node(s):        1
    Vendor ID:           GenuineIntel
    CPU family:          6
    Model:               42
    Model name:          Intel(R) Core(TM) i3-2348M CPU @ 2.30GHz
    Stepping:            7
    CPU MHz:             905.312
    CPU max MHz:         2300.0000
    CPU min MHz:         800.0000
    BogoMIPS:            4589.49
    Virtualization:      VT-x
    L1d cache:           32K
    L1i cache:           32K
    L2 cache:            256K
    L3 cache:            3072K
    NUMA node0 CPU(s):   0-3
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts
    
  2. 8 GB of RAM. (See htop output below).

  3. 99.83 MHz of mainboard bus speed
  4. 500 GB internal HDD – This is the S.M.A.R.T. report from the
    operating system:

    smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.15.0-33-generic] (local build)
    Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org
    
    === START OF INFORMATION SECTION ===
    Model Family:     Western Digital Blue Mobile
    Device Model:     WDC WD5000LPVX-22V0TT0
    Serial Number:    WD-WXE1E13AAMR4
    LU WWN Device Id: 5 0014ee 25db04ba7
    Firmware Version: 01.01A01
    User Capacity:    500,107,862,016 bytes [500 GB]
    Sector Sizes:     512 bytes logical, 4096 bytes physical
    Rotation Rate:    5400 rpm
    Device is:        In smartctl database [for details use: -P show]
    ATA Version is:   ACS-2 (minor revision not indicated)
    SATA Version is:  SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s)
    Local Time is:    Wed Aug  7 15:52:05 2019 CET
    SMART support is: Available - device has SMART capability.
    SMART support is: Enabled
    
    === START OF READ SMART DATA SECTION ===
    SMART overall-health self-assessment test result: PASSED
    
    General SMART Values:
    Offline data collection status:  (0x00) Offline data collection activity
                        was never started.
                        Auto Offline Data Collection: Disabled.
    Self-test execution status:      (   0) The previous self-test routine completed
                        without error or no self-test has ever 
                        been run.
    Total time to complete Offline 
    data collection:        ( 8040) seconds.
    Offline data collection
    capabilities:            (0x7b) SMART execute Offline immediate.
                        Auto Offline data collection on/off support.
                        Suspend Offline collection upon new
                        command.
                        Offline surface scan supported.
                        Self-test supported.
                        Conveyance Self-test supported.
                        Selective Self-test supported.
    SMART capabilities:            (0x0003) Saves SMART data before entering
                        power-saving mode.
                        Supports SMART auto save timer.
    Error logging capability:        (0x01) Error logging supported.
                        General Purpose Logging supported.
    Short self-test routine 
    recommended polling time:    (   2) minutes.
    Extended self-test routine
    recommended polling time:    (  93) minutes.
    Conveyance self-test routine
    recommended polling time:    (   5) minutes.
    SCT capabilities:          (0x7035) SCT Status supported.
                        SCT Feature Control supported.
                        SCT Data Table supported.
    
    SMART Attributes Data Structure revision number: 16
    Vendor Specific SMART Attributes with Thresholds:
    ID# ATTRIBUTE_NAME          FLAG     VALUE WORST THRESH TYPE      UPDATED  WHEN_FAILED RAW_VALUE
      1 Raw_Read_Error_Rate     0x002f   200   200   051    Pre-fail  Always       -       1
      3 Spin_Up_Time            0x0027   149   143   021    Pre-fail  Always       -       1541
      4 Start_Stop_Count        0x0032   057   057   000    Old_age   Always       -       43173
      5 Reallocated_Sector_Ct   0x0033   200   200   140    Pre-fail  Always       -       0
      7 Seek_Error_Rate         0x002e   200   200   000    Old_age   Always       -       0
      9 Power_On_Hours          0x0032   083   083   000    Old_age   Always       -       12797
     10 Spin_Retry_Count        0x0032   100   100   000    Old_age   Always       -       0
     11 Calibration_Retry_Count 0x0032   100   100   000    Old_age   Always       -       0
     12 Power_Cycle_Count       0x0032   091   091   000    Old_age   Always       -       9496
    191 G-Sense_Error_Rate      0x0032   001   001   000    Old_age   Always       -       250
    192 Power-Off_Retract_Count 0x0032   200   200   000    Old_age   Always       -       399
    193 Load_Cycle_Count        0x0032   147   147   000    Old_age   Always       -       160989
    194 Temperature_Celsius     0x0022   101   092   000    Old_age   Always       -       42
    196 Reallocated_Event_Count 0x0032   200   200   000    Old_age   Always       -       0
    197 Current_Pending_Sector  0x0032   200   200   000    Old_age   Always       -       0
    198 Offline_Uncorrectable   0x0030   100   253   000    Old_age   Offline      -       0
    199 UDMA_CRC_Error_Count    0x0032   200   200   000    Old_age   Always       -       0
    200 Multi_Zone_Error_Rate   0x0008   100   253   000    Old_age   Offline      -       0
    
    SMART Error Log Version: 1
    No Errors Logged
    
    SMART Self-test log structure revision number 1
    No self-tests have been logged.  [To run self-tests, use: smartctl -t]
    
    SMART Selective self-test log data structure revision number 1
     SPAN  MIN_LBA  MAX_LBA  CURRENT_TEST_STATUS
        1        0        0  Not_testing
        2        0        0  Not_testing
        3        0        0  Not_testing
        4        0        0  Not_testing
        5        0        0  Not_testing
    Selective self-test flags (0x0):
      After scanning selected spans, do NOT read-scan remainder of disk.
    If Selective self-test is pending on power-up, resume after 0 minute delay.
    

These are the results of resource usage per htop:

  1  [|||||                    14.1%]   Tasks: 286, 1497 thr; 2 running
  2  [|||||                    13.2%]   Load average: 3.00 4.97 6.09 
  3  [|||||                    12.5%]   Uptime: 3 days, 16:12:35
  4  [|||                       9.3%]
  Mem[|||||||||||||||||||5.09G/7.61G]
  Swp[|||||||||||||||||||3.68G/4.65G]

  PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
 7006 jvb        20   0 6640M  102M  6780 S  5.3  1.3 18:53.18 java -Xmx3072m -X
 8224 kais     20   0 4537M  771M  200M S  6.6  9.9  2h31:23 /usr/lib/firefox/
 2299 kais     20   0 2958M  184M 42912 S  5.3  2.4 13:54.41 /usr/lib/firefox/
 1216 root       20   0  519M  120M 94640 S  5.3  1.5  1h52:50 /usr/lib/xorg/Xor
28401 kais     20   0 3354M  584M  107M S  7.9  7.5 34:44.51 /usr/lib/firefox/
 8439 kais     20   0 4537M  771M  200M S  4.6  9.9 37:06.21 /usr/lib/firefox/
 8831 kais     20   0 3222M  351M 64828 R  4.0  4.5 11:19.87 /usr/lib/firefox/
 7025 jvb        20   0 6640M  102M  6780 S  0.0  1.3  0:18.34 java -Xmx3072m -X
 7027 jvb        20   0 6640M  102M  6780 S  0.0  1.3  0:18.05 java -Xmx3072m -X
 5901 kais     20   0  7492  5612  2904 R  4.0  0.1  0:00.66 htop
 5329 kais     20   0  547M 47456 38388 S  1.3  0.6  0:01.29 /usr/lib/gnome-te
13540 kais     20   0 2958M  184M 42912 S  2.0  2.4  0:06.25 /usr/lib/firefox/
16897 kais     20   0  904M 28292 18076 S  2.0  0.4 50:08.37 pavucontrol
17999 kais     20   0 2424M 29460 25380 S  1.3  0.4 52:41.73 /usr/bin/pulseaud
F1 Help  F2 Setup  F3 Search  F4 Filter  F5 Tree  F6 SortBy F7 Nice  -  F8 Nice  +  F9 Kill  F10 Quit

Those are the results of VM statistics as well, generated by the command vmstat 5.

AFAIK, bloatware shouldn't make the OS unresponsive, so I wouldn't consider or even accept that the bloatware is the root cause of the problem – since the OS job is isolating processes and ensuring multitasking.

I don't know whether this issue is OS-specific, hardware-specific, or configuration-specific.

Any ideas?

Best Answer

What can make Linux so unresponsive?

Overcommitting available RAM, which causes a large amount of swapping, can definitely do this. Remember that random access I/O on your mechanical HDD requires moving a read/write head, which can only do around 100 seeks per second.

It's usual for Linux to go totally out to lunch, if you overcommit RAM "too much". I also have a spinny disk and 8GB RAM. I have had problems with a couple of pieces of software with memory leaks. I.e. their memory usage keeps growing over time and never shrinks, so the only way to control it would have been to stop the software and then restart it. Based on the experiences I had during this, I am not very surprised to hear delays over ten minutes, if you are generating 3GB+ of swap.

You won't necessarily see this in all cases where you have more than 3GB of swap. Theory says the key concept is thrashing. On the other hand, if you are trying to switch between two different working sets, and it requires swapping 3GB in and out, at 100MB/s it will take at least 60 seconds even if the I/O pattern can be perfectly optimized. In practice, the I/O pattern will be far from optimal.

After the difficulty I had with this, I reformatted my swap space to 2GB (several times smaller than before), so the system would not be able to swap as deeply. You can do this even without messing around resizing the partition, because mkswap takes an optional size parameter.

The rough balance is between running out of memory and having processes get killed, and having the system hang for so long that you give up and reboot anyway. I don't know if a 4GB swap partition is too large; it might depend what you're doing. The important thing is to watch out for when the disk starts churning, check your memory usage, and respond accordingly.

Checking memory usage of multi-process applications is difficult. To see memory usage per-process without double-counting shared memory, you can use sudo atop -R, press M and m, and look in the PSIZE column. You can also use smem. smem -t -P firefox will show PSS of all your firefox processes, followed by a line with total PSS. This is the correct approach to measure total memory usage of Firefox or Chrome based browsers. (Though there are also browser-specific features for showing memory usage, which will show individual tabs).

Related Question