Why do systems become slow when doing massive writes to disk

ioperformance

I want to know why systems become slow when writing mass data to disk.

I think that for system to become slow, there should be some issue with CPU. But write is only I/O bound.

Do hardware interrupts occur when writing data? If so, it may be because of the interrupts that the CPU is always context switching.

Best Answer

The core reason behind is that the usual: I/O is much slower than CPU/RAM. Even if the processes doing I/O operations use DMA (which offloads the CPU), at some point they are likely to need to wait on the completing of their requests.

In the most usual case of a HDD just add in several applications trying to access files scattered around the drive, and you can make yourself a coffee (tea, whatever). With SSDs the situation gets better, but even an SSD - which has throughput measured in hundreds of MB/s on SATA (as compared to tens of MB/s of a spin-plate HDD) and really negligible seek times (compared to miliseconds for a spin-plate) - can become a bottleneck.

The problem as I understand it is not just in the data transfers themselves, but in the necessary overhead - I/O is controlled by kernel, but only seldom happens without userspace. Thus there can be a plenty of context switches, just from the applications waiting on I/O checking whether something is happening (depends on implementation, of course). In the case of disk transfers, there may well be several kernel threads competing for resources or busy-waiting (which sometimes is appropriate strategy). Remember, for example copying data from one partition to another requires a modern filesystem to: find out where the source data is, read it, allocate space on the target file system, write meta data, write data, repeat until finished.

And if, at some point, your system starts swapping (which usually has higher priority than regular I/O), the disaster is finalized.

EDIT: After talking to some Linux kernel developers the situation became a bit clearer. The main problem is the I/O scheduler, which doesn't have much idea about which I/O to prioritise. Hence any user input and following graphical output is sharing the queue with the disk/network activity. As a consequence of that, it may also happen that it may throw away cached process data from page cache (e.g. loaded libraries) when it concludes it can use the page cache more effectively on other I/O. That of course means that once that code needs to be run again, it will have to be fetched again - form the disk that may be under heavy load already.

That said, as far as the Linux kernel goes, many of these issues have been fixed recently (the problem has been known), so say 4.4.x or 4.5.x should behave better that it used to and problems should be reported (generally the kernel people are happy when someone wants to help by bug reporting and testing).

Related Question