Linux – Why is dd using direct slower writing to disk than to a file

coreutilsfilesystemshard-disklinuxperformance

I am trying to compare aggregate write rates when writing to a file in a GPFS file system, as compared to writing directly to a disk on a system with Red Hat Enterprise Linux Server release 6.4 (Santiago). For my application I need to measure the raw rate, i.e. without taking advantage of cache. I do not understand the impact of the direct option used with dd to bypass cache. When writing directly to a block device, I get a drastically lower rate when I use oflag=direct, as compared with writing to a file in the GPFS file system. Why does this happen?

To measure aggregate rates I create p processes running dd that writes concurrently to the block device or file. I then sum the p rates obtained to get the aggregate write rate.

    #!/bin/bash
    directdiskrate=~/scratch/rate5
    syncdiskrate=~/scratch/rate4
    filerate=~/scratch/rate3
    numruns=1
    numthreads=30

    #to disk use both conv=fsync and oflag=direct
    writetodiskdirect="dd if=/dev/zero of=/dev/sdac bs=256k count=4096 conv=fsync oflag=direct iflag=fullblock"
    for p in $(seq $numthreads)
    do
             #parses output of dd, rate is on last line, each field separated by ,s
            $writetodiskdirect 2>&1|tail -n 1|awk 'BEGIN { FS = "," } ; { print $3 }'|sed -e 's/MB\/s//g'>>$directdiskrate&
    done
    wait

    #to disk use only conv=fsync option
    writetodisksync="dd if=/dev/zero of=/dev/sdac bs=256k count=4096 conv=fsync iflag=fullblock"
    for p in $(seq $numthreads)
    do
       #parses output of dd, rate is on last line, each field separated by ,s
       $writetodisksync 2>&1|tail -n 1|awk 'BEGIN { FS = "," } ; { print $3 }'|sed -e 's/MB\/s//g'>>$syncdiskrate&
    done
    wait

    #to file use both conv=fsync and oflag=direct
    for p in $(seq $numthreads)
    do
        writetofile="dd if=/dev/zero of=/gpfs1/fileset6/file$p bs=256k count=4096 conv=fsync oflag=direct"
        #parses output of dd, rate is on last line, each field separated by ,s
        $writetofile 2>&1|tail -n 1|awk 'BEGIN { FS = "," } ; { print $3 }'|sed -e 's/MB\/s//g'>>$filerate&
    done
    wait

Results: The write rate of each of 30 processes is as follows:

Writing to disk using conv=fsync option, each process gets a write rate of ~180MB/s
Writing to disk using both conv=fsync and oflag=direct, each process gets a write rate of ~9MB/s
Writing to a file in GPFS file system, using both conv=fsync and oflag=direct, gets a write rate of ~80MB/s

Best Answer

This difference undoubtedly comes down to one thing: caching.

It will be really difficult to pin down where, especially from userland, but all Linux kernels buffer (cache) filesystem writes, unless you perform the tricks to get synchronous writes. That is, the kernel will save the data dd sends to a file somewhere in kernel memory. The kernel probably uses file system code to do this. Some time in the future, the kernel will schedule a disk block to go out to the disk. That will happen "asynchronously", sometime after the kernel tells dd that the write finished.

The reason for this is that moving bytes over a bus and into a disk drive, and then on to the disk platters is much slower than even copying from user to kernel memory. Ordinarily, programs don't care too much that the data they just "wrote" won't make it to the disk for a while. Hardware reliability is high enough that the data makes it to platter almost always.

That's the simple answer, but once you've got reads/writes/deletes all buffered up in the kernel, the file system code can take advantage of short file lifetimes by never writing out the data of files that get deleted before they make it to disk. The file system code can group writes to take advantage of disk blocks larger than a group of writes and consolidate them into one write. There's tons of optimizations that can be done in most file systems.

Related Solutions

Linux – Make disk/disk copy slower

You can throttle a pipe with pv -qL (or cstream -t provides similar functionality)

tar -cf - . | pv -q -L 8192 | tar -C /your/usb -xvf -

-q removes stderr progress reporting.

The -L limit is in bytes.

More about the --rate-limit/-L flag from the man pv:

-L RATE, --rate-limit RATE

    Limit the transfer to a maximum of RATE bytes per second.
    A suffix of "k", "m", "g", or "t" can be added to denote
    kilobytes (*1024), megabytes, and so on.

This answer originally pointed to throttle but that project is no longer available so has slipped out of some package systems.

Is reading a file on UNIX faster than writing a file

It depends. There is no general answer to this question.

In the absence of caching, writing a disk file is usually measurably slower than reading. This has little to do with the operating system and everything to do with the hardware: both hard disks and solid state media read faster than they write. A secondary factor is related to filesystem structure: reading only needs to traverse the directory tree and block list down to the data, then read the data, whereas writing needs to perform the same traversal, then write the data, then update some metadata.

When caching comes into play, things change. Reading data that's in cache is very fast, but reading data that isn't in cache has to go and fetch it from the disk. Operating systems might try to anticipate reads, but that only works in very specific cases (mainly sequential reads from a file). Writing, on the other hand, can be near-instantaneous as long as the amount of data isn't too large, as the data is only written to a memory buffer. The buffer has to be written to disk eventually, but by that time your application has already moved on to do more stuff.

Best Answer

Related Solutions

Linux – Make disk/disk copy slower

Is reading a file on UNIX faster than writing a file

Related Question