Linux – Why Does Blktrace Only Write Blocks of 8?

block-deviceiolinux

I want to understand the I/O pattern a database is writing to disk to decide how many disks to use for best performance. To analyse the I/O pattern I want to use blktrace and I have to grok it first. This is what I try here.

I have a USB stick that I attach to my computer and it becomes /dev/sdd. Now I start

dd if=/dev/sdd of=/dev/null

and on a separate window I start

blktrace -d /dev/sdd -o - | blkparse -i -

and expect to see read (R) operations that get merged (M) and put into the queue (Q). That works, but to my understanding the block size is always 8:

8,48   6    15257     2.157995037  2470  M   R 816696 + 8 [dd]
8,48   6    15258     2.157996273  2470  Q   R 816704 + 8 [dd]
8,48   6    15259     2.157996520  2470  M   R 816704 + 8 [dd]
8,48   6    15260     2.157997794  2470  Q   R 816712 + 8 [dd]

Now I am stopping everything and tell the system to read only one byte:

dd if=/dev/sdd of=/dev/null count=1 bs=1
1+0 records in
1+0 records out
1 byte (1 B) copied, 0.00325544 s, 0.3 kB/s

This shows up on the blkparse console like this:

8,48   6        1    17.220316681  2543  G   N [dd]
8,48   6        2    17.220317209  2543  I   N 0 (00 ..) [dd]
8,48   6        3    17.220317707  2543  D   N 0 (00 ..) [dd]
8,48   6        4    17.220787473  2543  Q   R 0 + 8 [dd]
8,48   6        5    17.220790545  2543  G   R 0 + 8 [dd]
8,48   6        6    17.220791330  2543  P   N [dd]
8,48   6        7    17.220793515  2543  Q   R 8 + 8 [dd]
8,48   6        8    17.220794597  2543  M   R 8 + 8 [dd]
8,48   6        9    17.220796134  2543  Q   R 16 + 8 [dd]
8,48   6       10    17.220796419  2543  M   R 16 + 8 [dd]
8,48   6       11    17.220797695  2543  Q   R 24 + 8 [dd]
8,48   6       12    17.220797943  2543  M   R 24 + 8 [dd]
8,48   6       13    17.220798862  2543  I   R 0 + 32 [dd]

what's going on here? Why does a read of one byte show up as 3 "R" requests, each with a Q and a M action? Why does it "seem to" read 32 or 24 bytes? Where is docutainment to educate me further?

Best Answer

Because you are doing buffered IO and the page cache works in whole pages, which are 4k on PCs, or 8 512 byte sectors. The kernel readahead mechanism also reads a bit more on the assumption that dd will continue reading. If you want to avoid this, then you need to use direct IO by passing dd the iflag=direct option, but you won't be able to have it read a single byte doing that -- direct IO must be aligned to, and an even multiple of the sector size.

Related Question