Linux – Massive, unpredictable I/O performance drop in Linux

iolinuxperformance

I'm using Debian testing without any problems for ~6 years (I'm just regularly updating it), but recently it started to show a random behaviour that can be summarized as "Low I/O performance which persists until reboot".

The problem is, suddenly all disk reads and writes slow down to ~5MB/sec which results in continuous read and writes. Since the rate is so low, disks are not mechanically challenged or stressed, but everything slows down until I reboot.

I/O subsystem of the computer consists of one OCZ Vertex 3 SSD and two WD Caviar Black HDDs. SSD holds read-heavy part of the OS and a partition on the HDD holds the rest.

To diagnose the problem I tried the following without success:

  • top doesn't show any runaway activity neither in CPU nor I/O usage.
  • hdparm returns normal performance ratings of the disks (I only checked -t though).
  • smartctl doesn't show any performance problems in disks. Long tests showed that the disks are as good as new.

System has Z77 Chipset, 16GB of RAM and Intel i7 3770K CPU and the stats show no signs of saturation in RAM, I/O or CPU, but I'm not experienced to debug problems like this (esp. in kernel space). Any help will be appreciated.

Update 1:

  • I ran (forced) fsck on every partition as a precaution. All FS are clean.
  • Incidentally I found a BIOS upgrade which came out a month ago & applied it.
  • No partition is filled more than 50%.

Update 2:

The problem is not surfacing up for two days. Either fsck or the BIOS update cleaned some clogs in the system. I'm still monitoring the issue and will close the question with a post-mortem answer.

Update 3:

Problem just resurfaced and I did some more digging. Please see the answer.

Best Answer

I managed to reproduce the problem again and it was result of a big disk cache. My disk caches can grow more than 8GB and seems that some applications doesn't like it and I/O suffers.

Dropping disk caches with echo 3 > /proc/sys/vm/drop_caches as root remedies the problem. I currently don't know why large disk caches causes this I/O degradation.

Last Update: After more investigation I've found out that number of files in the cache was triggering the problem. It was trashing the disks while trying to commit many small files back to the disk. Since I was using the system for ten years, I've took the plunge and reinstalled with 64 bit Debian. Now it's working smoothly. It was probably a side effect of ten years of upgrading with finding limits of 32 bit operating system.

Related Question