Why are processes blocked by I/O in case of heavy system load

ioiotop

I have a workstation(2x Intel Xeon family CPUs and 128GiB of RAM) running several virtual machines and while the combined CPU usage is <30%, then the load average is between 20 and 25. For example, if I execute a tar -xzvf vm_data.tgz --directory vm4/ --strip-components=1 command, then the gzip process is 90% – 99% of its time blocked by I/O and the command takes forever to complete:

enter image description here

On the other hand, the actual reads and writes to disks are very low compared to SATA 3.0 or SSDs(I'm using single Kingston SA400S37960G SSD) hardware limits.

What might cause a process(gzip in my example) to wait after the I/O while the actual disk reads and writes appear to be very low? My first thought was that maybe the system interrupts are very high and that's what's blocking the I/O, but according to /proc/interrupts this does not seem to be the case as none of the counters are increasing rapidly.

Best Answer

I had a very similar issue many years ago with our production MySQL database. It turned out its files very extremely fragmented and backing them up resulting in all other disk operations taking forever to complete.

Please post the output of:

find vm4 -type f | while read filename; do sudo filefrag "$filename" | egrep -v ": 1 extent|: 0 extents"; done | sort

To resolve the issue, in case my guess turns out to be true, you'll need to defrag VM files.

Related Question