I'm using Linux 4.15, and this happens to me many times when I browse Google, Facebook or any other resource-hungry website – The whole OS becomes unresponsive, frozen and useless. The only thing I see it to be working is the disk (main system partition formatted as ext4), which is massively in use (I/O throttling).
I get forced to wait for a minute or more to get rid of the bloat, sometimes it stays unresponsive for twelve minutes, and hence I get frustrated. The fact the OS being not able to well-handle multitasking, tends to reflect an absolutely weird and unacceptable behavior.
Not only this happens with Firefox, but with any javascript-interpreter application including Microsoft VSCode or angular-cli (ng serve
command) as well as any other resource-hungry thread of execution – such as the case of plantuml when generating a very large graph from a very complex UML diagram.
Today, the OS becomes totally unmanageable, after launching a data recovery software for an external HDD (over ext4 partition) that got lately unplugged from a bad USB port by little move.
I'm not able to tell the root cause behind such buggy behavior
I have many tabs opened in the browser, and 94% OS-partition usage as per df
output:
Filesystem 1K-blocks Used Available Use% Mounted on
udev 3964160 0 3964160 0% /dev
tmpfs 798164 3192 794972 1% /run
/dev/sda5 173466400 153224316 11407424 94% /
tmpfs 3990820 62936 3927884 2% /dev/shm
tmpfs 5120 4 5116 1% /run/lock
tmpfs 3990820 0 3990820 0% /sys/fs/cgroup
/dev/loop5 128 128 0 100% /snap/anbox-installer/24
/dev/loop2 128 128 0 100% /snap/anbox-installer/17
/dev/loop4 223616 223616 0 100% /snap/kde-frameworks-5/26
/dev/loop3 90624 90624 0 100% /snap/core/7169
/dev/loop7 223616 223616 0 100% /snap/kde-frameworks-5/25
/dev/loop8 90624 90624 0 100% /snap/core/7270
/dev/loop0 87552 87552 0 100% /snap/qownnotes/2160
/dev/loop1 241664 241664 0 100% /snap/kde-frameworks-5/27
tmpfs 798164 0 798164 0% /run/user/0
tmpfs 798164 32 798132 1% /run/user/1000
/dev/loop9 87552 87552 0 100% /snap/qownnotes/2176
/dev/sda3 188669948 187132488 1537460 100% /media/kais/DATA
/dev/sdb1 15142960 2091904 13051056 14% /media/kais/STORE N GO
As hardware, I'm using:
-
Intel Core i3 v2348M as per
lscpu
:Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 36 bits physical, 48 bits virtual CPU(s): 4 On-line CPU(s) list: 0-3 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 42 Model name: Intel(R) Core(TM) i3-2348M CPU @ 2.30GHz Stepping: 7 CPU MHz: 905.312 CPU max MHz: 2300.0000 CPU min MHz: 800.0000 BogoMIPS: 4589.49 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 3072K NUMA node0 CPU(s): 0-3 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer xsave avx lahf_lm epb pti tpr_shadow vnmi flexpriority ept vpid xsaveopt dtherm arat pln pts
-
8 GB of RAM. (See
htop
output below). - 99.83 MHz of mainboard bus speed
-
500 GB internal HDD – This is the S.M.A.R.T. report from the
operating system:smartctl 6.6 2017-11-05 r4594 [x86_64-linux-4.15.0-33-generic] (local build) Copyright (C) 2002-17, Bruce Allen, Christian Franke, www.smartmontools.org === START OF INFORMATION SECTION === Model Family: Western Digital Blue Mobile Device Model: WDC WD5000LPVX-22V0TT0 Serial Number: WD-WXE1E13AAMR4 LU WWN Device Id: 5 0014ee 25db04ba7 Firmware Version: 01.01A01 User Capacity: 500,107,862,016 bytes [500 GB] Sector Sizes: 512 bytes logical, 4096 bytes physical Rotation Rate: 5400 rpm Device is: In smartctl database [for details use: -P show] ATA Version is: ACS-2 (minor revision not indicated) SATA Version is: SATA 3.0, 6.0 Gb/s (current: 6.0 Gb/s) Local Time is: Wed Aug 7 15:52:05 2019 CET SMART support is: Available - device has SMART capability. SMART support is: Enabled === START OF READ SMART DATA SECTION === SMART overall-health self-assessment test result: PASSED General SMART Values: Offline data collection status: (0x00) Offline data collection activity was never started. Auto Offline Data Collection: Disabled. Self-test execution status: ( 0) The previous self-test routine completed without error or no self-test has ever been run. Total time to complete Offline data collection: ( 8040) seconds. Offline data collection capabilities: (0x7b) SMART execute Offline immediate. Auto Offline data collection on/off support. Suspend Offline collection upon new command. Offline surface scan supported. Self-test supported. Conveyance Self-test supported. Selective Self-test supported. SMART capabilities: (0x0003) Saves SMART data before entering power-saving mode. Supports SMART auto save timer. Error logging capability: (0x01) Error logging supported. General Purpose Logging supported. Short self-test routine recommended polling time: ( 2) minutes. Extended self-test routine recommended polling time: ( 93) minutes. Conveyance self-test routine recommended polling time: ( 5) minutes. SCT capabilities: (0x7035) SCT Status supported. SCT Feature Control supported. SCT Data Table supported. SMART Attributes Data Structure revision number: 16 Vendor Specific SMART Attributes with Thresholds: ID# ATTRIBUTE_NAME FLAG VALUE WORST THRESH TYPE UPDATED WHEN_FAILED RAW_VALUE 1 Raw_Read_Error_Rate 0x002f 200 200 051 Pre-fail Always - 1 3 Spin_Up_Time 0x0027 149 143 021 Pre-fail Always - 1541 4 Start_Stop_Count 0x0032 057 057 000 Old_age Always - 43173 5 Reallocated_Sector_Ct 0x0033 200 200 140 Pre-fail Always - 0 7 Seek_Error_Rate 0x002e 200 200 000 Old_age Always - 0 9 Power_On_Hours 0x0032 083 083 000 Old_age Always - 12797 10 Spin_Retry_Count 0x0032 100 100 000 Old_age Always - 0 11 Calibration_Retry_Count 0x0032 100 100 000 Old_age Always - 0 12 Power_Cycle_Count 0x0032 091 091 000 Old_age Always - 9496 191 G-Sense_Error_Rate 0x0032 001 001 000 Old_age Always - 250 192 Power-Off_Retract_Count 0x0032 200 200 000 Old_age Always - 399 193 Load_Cycle_Count 0x0032 147 147 000 Old_age Always - 160989 194 Temperature_Celsius 0x0022 101 092 000 Old_age Always - 42 196 Reallocated_Event_Count 0x0032 200 200 000 Old_age Always - 0 197 Current_Pending_Sector 0x0032 200 200 000 Old_age Always - 0 198 Offline_Uncorrectable 0x0030 100 253 000 Old_age Offline - 0 199 UDMA_CRC_Error_Count 0x0032 200 200 000 Old_age Always - 0 200 Multi_Zone_Error_Rate 0x0008 100 253 000 Old_age Offline - 0 SMART Error Log Version: 1 No Errors Logged SMART Self-test log structure revision number 1 No self-tests have been logged. [To run self-tests, use: smartctl -t] SMART Selective self-test log data structure revision number 1 SPAN MIN_LBA MAX_LBA CURRENT_TEST_STATUS 1 0 0 Not_testing 2 0 0 Not_testing 3 0 0 Not_testing 4 0 0 Not_testing 5 0 0 Not_testing Selective self-test flags (0x0): After scanning selected spans, do NOT read-scan remainder of disk. If Selective self-test is pending on power-up, resume after 0 minute delay.
These are the results of resource usage per htop
:
1 [||||| 14.1%] Tasks: 286, 1497 thr; 2 running
2 [||||| 13.2%] Load average: 3.00 4.97 6.09
3 [||||| 12.5%] Uptime: 3 days, 16:12:35
4 [||| 9.3%]
Mem[|||||||||||||||||||5.09G/7.61G]
Swp[|||||||||||||||||||3.68G/4.65G]
PID USER PRI NI VIRT RES SHR S CPU% MEM% TIME+ Command
7006 jvb 20 0 6640M 102M 6780 S 5.3 1.3 18:53.18 java -Xmx3072m -X
8224 kais 20 0 4537M 771M 200M S 6.6 9.9 2h31:23 /usr/lib/firefox/
2299 kais 20 0 2958M 184M 42912 S 5.3 2.4 13:54.41 /usr/lib/firefox/
1216 root 20 0 519M 120M 94640 S 5.3 1.5 1h52:50 /usr/lib/xorg/Xor
28401 kais 20 0 3354M 584M 107M S 7.9 7.5 34:44.51 /usr/lib/firefox/
8439 kais 20 0 4537M 771M 200M S 4.6 9.9 37:06.21 /usr/lib/firefox/
8831 kais 20 0 3222M 351M 64828 R 4.0 4.5 11:19.87 /usr/lib/firefox/
7025 jvb 20 0 6640M 102M 6780 S 0.0 1.3 0:18.34 java -Xmx3072m -X
7027 jvb 20 0 6640M 102M 6780 S 0.0 1.3 0:18.05 java -Xmx3072m -X
5901 kais 20 0 7492 5612 2904 R 4.0 0.1 0:00.66 htop
5329 kais 20 0 547M 47456 38388 S 1.3 0.6 0:01.29 /usr/lib/gnome-te
13540 kais 20 0 2958M 184M 42912 S 2.0 2.4 0:06.25 /usr/lib/firefox/
16897 kais 20 0 904M 28292 18076 S 2.0 0.4 50:08.37 pavucontrol
17999 kais 20 0 2424M 29460 25380 S 1.3 0.4 52:41.73 /usr/bin/pulseaud
F1 Help F2 Setup F3 Search F4 Filter F5 Tree F6 SortBy F7 Nice - F8 Nice + F9 Kill F10 Quit
Those are the results of VM statistics as well, generated by the command vmstat 5
.
AFAIK, bloatware shouldn't make the OS unresponsive, so I wouldn't consider or even accept that the bloatware is the root cause of the problem – since the OS job is isolating processes and ensuring multitasking.
I don't know whether this issue is OS-specific, hardware-specific, or configuration-specific.
Any ideas?
Best Answer
Overcommitting available RAM, which causes a large amount of swapping, can definitely do this. Remember that random access I/O on your mechanical HDD requires moving a read/write head, which can only do around 100 seeks per second.
It's usual for Linux to go totally out to lunch, if you overcommit RAM "too much". I also have a spinny disk and 8GB RAM. I have had problems with a couple of pieces of software with memory leaks. I.e. their memory usage keeps growing over time and never shrinks, so the only way to control it would have been to stop the software and then restart it. Based on the experiences I had during this, I am not very surprised to hear delays over ten minutes, if you are generating 3GB+ of swap.
You won't necessarily see this in all cases where you have more than 3GB of swap. Theory says the key concept is thrashing. On the other hand, if you are trying to switch between two different working sets, and it requires swapping 3GB in and out, at 100MB/s it will take at least 60 seconds even if the I/O pattern can be perfectly optimized. In practice, the I/O pattern will be far from optimal.
After the difficulty I had with this, I reformatted my swap space to 2GB (several times smaller than before), so the system would not be able to swap as deeply. You can do this even without messing around resizing the partition, because
mkswap
takes an optional size parameter.The rough balance is between running out of memory and having processes get killed, and having the system hang for so long that you give up and reboot anyway. I don't know if a 4GB swap partition is too large; it might depend what you're doing. The important thing is to watch out for when the disk starts churning, check your memory usage, and respond accordingly.
Checking memory usage of multi-process applications is difficult. To see memory usage per-process without double-counting shared memory, you can use
sudo atop -R
, press M and m, and look in the PSIZE column. You can also usesmem
.smem -t -P firefox
will show PSS of all your firefox processes, followed by a line with total PSS. This is the correct approach to measure total memory usage of Firefox or Chrome based browsers. (Though there are also browser-specific features for showing memory usage, which will show individual tabs).