Poor random read SSD performance on Linux

performancessd

I recently got an Intel 320-series SSD, and I am having difficulty achieving the advertised 38K IOPS for random 4K reads.

Both with fio and my own hacked-together program, I am seeing around 6K IOPS. It's almost like the IO depth size does not matter, and the kernel is trying to fetch one block at a time.

Example:

$ cat job
[randread]
filename=/dev/sdb2
rw=randread
size=128m
blocksize=4k
ioengine=libaio
iodepth=64
direct=1

$ sudo fio job
randread: (g=0): rw=randread, bs=4K-4K/4K-4K, ioengine=libaio, iodepth=64
Starting 1 process
Jobs: 1 (f=1): [r] [100.0% done] [25423K/0K /s] [6207/0 iops] [eta 00m:00s]
randread: (groupid=0, jobs=1): err= 0: pid=4678
  read : io=131072KB, bw=24852KB/s, iops=6213, runt=  5274msec
    slat (usec): min=1, max=94, avg= 5.00, stdev= 2.88
    clat (usec): min=312, max=13070, avg=10290.25, stdev=1399.78
    bw (KB/s) : min=23192, max=24464, per=97.08%, avg=24125.60, stdev=383.70
  cpu          : usr=15.74%, sys=22.57%, ctx=31642, majf=0, minf=88
  IO depths    : 1=0.1%, 2=0.1%, 4=0.1%, 8=0.1%, 16=0.1%, 32=0.1%, >=64=99.8%
     submit    : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.0%, >=64=0.0%
     complete  : 0=0.0%, 4=100.0%, 8=0.0%, 16=0.0%, 32=0.0%, 64=0.1%, >=64=0.0%
     issued r/w: total=32768/0, short=0/0
     lat (usec): 500=0.01%, 750=0.01%, 1000=0.03%
     lat (msec): 2=0.05%, 4=0.10%, 10=20.10%, 20=79.70%

Run status group 0 (all jobs):
   READ: io=131072KB, aggrb=24852KB/s, minb=25448KB/s, maxb=25448KB/s, mint=5274msec, maxt=5274msec

Disk stats (read/write):
  sdb: ios=30453/0, merge=850/0, ticks=319060/0, in_queue=319060, util=98.09%

The system is Linux 2.6.35-31-generic #63-Ubuntu SMP Mon Nov 28 19:29:10 UTC 2011 x86_64 GNU/Linux. /dev/sdb2 above is a ~10GB partition on an 80GB SSD. fio is version 1.38.

Would really appreciate thoughts on what might be wrong.

PS: The partition in the above test (/dev/sdb2) is aligned on a 4K boundary. Reading from a larger span (size=10g) does not help.

Best Answer

See discussion at Can I configure my Linux system for more aggressive file system caching?

In short, you probably need to have to tune some device queue settings. I'd guess scheduler or queue setting is incorrectly guessed by kernel or manually set.

Try

grep . /sys/block/sd*/{queue/{nr_requests,nomerges,rotational,scheduler},device/queue_depth}

and

lsblk

to debug the issue. You should have queue_depth and nr_requests set to at least 31 if you can tolerate the latency for a single read caused by NCQ, nomerges should be zero and rotational should be zero for SSD. Selecting the correct scheduler will be harder but for raw IOPS noop should be good enough. I find correctly tuned cfq to handle real world requirements a bit better.

And make sure the disk is connected with SATA + AHCI and you have NCQ enabled. Otherwise, there is little hope for getting everything out from your disk.

Related Question