Linux disk access slows down system

hard drivelinuxperformanceswapubuntu-10.04

My system slows down even when the CPU load is less than 100%, and I think it's doing this because it's writing to the swap partition. Yes, my swap partition is on a different disk than my OS. I remember this type of slowdown problem from Windows using ATA disks. It was solved by using DMA mode. I'm not sure if my disks are using DMA mode. They are SATA drives so I assumed that they are. This is the output from hdparm

/dev/sda:
 multcount     =  0 (off)
 IO_support    =  1 (32-bit)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 182401/255/63, sectors = 2930277168, start = 0

ev/sdb:
 multcount     =  0 (off)
 IO_support    =  1 (32-bit)
 readonly      =  0 (off)
 readahead     = 256 (on)
 geometry      = 30401/255/63, sectors = 488397168, start = 0

The last time I saw this type of slowdown behavior was from Windows 3.1!

The output from hdparm -i /dev/sda /dev/sdb is this:

/dev/sda:

 Model=ST31500541AS, FwRev=CC34, SerialNo=6XW0N2LJ
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs RotSpdTol>.5% }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=4
 BuffType=unknown, BuffSize=0kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=2930277168
 IORDY=on/off, tPIO={min:120,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=yes: unknown setting WriteCache=enabled
 Drive conforms to: unknown:  ATA/ATAPI-4,5,6,7

 * signifies the current active mode


/dev/sdb:

 Model=HDT722525DLA380, FwRev=V44OA96A, SerialNo=VDB41BT4EUH03C
 Config={ HardSect NotMFM HdSw>15uSec Fixed DTR>10Mbs }
 RawCHS=16383/16/63, TrkSize=0, SectSize=0, ECCbytes=52
 BuffType=DualPortCache, BuffSize=7674kB, MaxMultSect=16, MultSect=off
 CurCHS=16383/16/63, CurSects=16514064, LBA=yes, LBAsects=488397168
 IORDY=on/off, tPIO={min:240,w/IORDY:120}, tDMA={min:120,rec:120}
 PIO modes:  pio0 pio1 pio2 pio3 pio4 
 DMA modes:  mdma0 mdma1 mdma2 
 UDMA modes: udma0 udma1 udma2 udma3 udma4 udma5 *udma6 
 AdvancedPM=yes: disabled (255) WriteCache=enabled
 Drive conforms to: ATA/ATAPI-7 T13 1532D revision 1:  ATA/ATAPI-2,3,4,5,6,7

 * signifies the current active mode

When the system slows down this is the output from free:

             total       used       free     shared    buffers     cached
Mem:       3538356    3057180     481176          0       8588     280412
-/+ buffers/cache:    2768180     770176
Swap:      5124692    1563140    3561552

Best Answer

You haven't asked a question. I suppose you meant to ask “why is my system slow” or “how can I make it faster”, both of which are too vague to be answerable. I'm going to partially address a less ambitious question, “How can I investigate my performance bottleneck”.

Apparently there's a lot of disk activity when your system is unresponsive. My answer is based on this.

RAM and swap

No matter how fast a disk you have, swapping is going to be slow. There isn't much you can do to make swap visibly faster. If your system is swapping, the only realistic cures are to use less memory, or buy more memory.

Interpreting the output of free

You can get a snapshot of how much memory your system is using with the command free. It shows something like this:

             total       used       free     shared    buffers     cached
Mem:       3538356    3057180     481176          0       8588     280412
-/+ buffers/cache:    2768180     770176
Swap:      5124692    1563140    3561552
  • The Mem, total figure (here 3538356, which is about 3.3GB) is the amount of RAM available to processes (this excludes memory used by the video card or by the kernel).

    (People with a 64-bit kernel can skip this paragraph.) Due to complexities in the x86 architecture, there are several ways for the kernel to access the RAM. These days, in practice, you can choose between two modes: PAE mode, which allows the kernel to use up to 64GB of RAM; and non-PAE mode, which only allows the kernel to use about 3GB. The reason the non-PAE mode exists is that the PAE mode has a memory usage overhead, which is only worth the cost if you do have more than 3GB of RAM. Concerned Ubuntu users should read the PAE page in the Ubuntu wiki.

  • The Mem, free figure shows how much memory is not used for anything. It's usually fairly small (say, 10–50MB on a multi-GB-RAM system) unless the RAM is underused or the system has just booted. Here the figure is pretty high (450MB), that's probably because an application using about that much memory was closed recently. Don't worry, it'll fill up soon.

  • The line headed -/+ buffers/cache shows how much memory is used by processes, as opposed to disk cache. Here, we see that there is only about 730MB available for the cache. That's only 20% of the RAM, it's not much. Having a lot of RAM used for the disk cache is important to keep a system responsive.

  • The last line indicates how much swap is in use. It's normal to have some swap in use, even if the RAM is not full. Linux copies memory to swap preventively when the disk is idle, in case the memory was needed at a later time when the disk might not be idle.

    Linux often moves process memory to swap in order to make room for the disk cache. This is normal system behavior, and trying to tone it down can lead to your system being slower. There is a tunable setting for how much Linux should swap, called vm.swapiness; if you experiment with it, be sure to try increasing swapiness as well as decreasing it.

    In our example free output, we see that about 40% of process memory is in the swap. Whether that hurts performance depends on what that memory is used for. If it's one big application that's not currently in use, it doesn't hurt. If part of the processes that are actively being used is swapped, the system can be very unresponsive.

So what's using all this memory?

Analyzing memory usage is hard. For each process, you can measure how much address space it allocates; you can (try to) measure how much memory it actively uses at a given time; you have to keep track of file-backed memory (e.g., code loaded from the process executable and libraries) and non-file-backed memory (the process's stack and heap). And of course a significant amount of memory is shared between processes, so it doesn't make that much sense to talk about how much memory a given process is using.

You can get a picture of memory usage with command-line tools like top and htop, or with any number of graphical system monitors and performance meters. For htop, if you're interested in memory usage, turn on the “hide kernel threads” and “hide userland threads” options in Setup/Display options.

In the display of top or htop, the relevant columns are VIRT and RES. VIRT indicates how much address space the process has allocated, including shared and allocated-but-not-used memory; don't worry too much about it. RES indicates how much RAM (i.e., not counting swap) a process is currently using.

One way to see if your system is actively swapping is to watch the top display while you work. If you see the RES figure rise for some processes while it's decreasing for other processes, it means the latter processes are being swapped out to make room for the former. If that happens often, you need more RAM to be comfortable with your usage patterns.

Disk speed

From past experience you suspect that your disk is not using the fastest available access mode. But Linux is not Windows (which especially in the old days often required third-party drivers). Installing a Linux distribution almost always gives you the fastest access modes for all peripherals that are not known to cause crashes or data loss. (Video drivers are somewhat of an exception.)

You can confirm the DMA mode being used by your system with hdparm -i. For example, your disks both show UDMA modes: … *udma6, meaning that they are using the fastest available mode. (As a general rule, UDMA is faster than DMA is faster than PIO; and for the numeric part, higher is faster.)

You can measure your disk's raw throughput with hdparm -t. The number in itself doesn't directly give information about how responsive your system will be, but it can be useful to compare the speed of two disks, or of two access modes on the same disk.

Sometimes your computer's BIOS lets you choose between different operating modes for your disks. When the first SATA disks came out, many operating systems (such as Windows and Linux) didn't come with suitable drivers. So BIOSes came with an option to use a PATA emulation mode that was in principle slower but more compatible. At the time, disks weren't fast enough to saturate a SATA connections anyway, but these days they might. The compatible setting is often called “IDE” or “ATAPI” while the faster setting is usually called “AHCI”.

Disk errors

One possible source of disk-related slowdown is when a disk is failing and the system needs to retry accessing it several times. Sometimes the slowdown is the first manifestation, and actual errors in the form of unreadable files come later.

To see if this is the case, look in the kernel logs, typically in /var/log/kern.log. If you see lines like

end_request: I/O error, dev sda, sector 123456789
ata3.00: error: { UNC }

make sure your backups are up-to-date and replace your disk immediately. Note that the lines above are examples only, there is a lot of variety in error messages.

You can get a report on the health of your disk with SMART monitoring tools.

Related Question