Is reading a file on UNIX faster than writing a file

filesfilesystemshard-disk

A rather obtuse performance bottleneck has boiled down to this very small query. I have done some empirical analysis and I think I might be getting victimized by disk caching strategies. Fundamentally (if disk caching were disabled) would writing a file be as fast, slower, or faster than reading a file? I would assume that the answer would depend on the fragmentation (and file size) but the operation to write a file would have to do an additional look up of where the next free block is rather than just following the pointer to it.

Best Answer

It depends. There is no general answer to this question.

In the absence of caching, writing a disk file is usually measurably slower than reading. This has little to do with the operating system and everything to do with the hardware: both hard disks and solid state media read faster than they write. A secondary factor is related to filesystem structure: reading only needs to traverse the directory tree and block list down to the data, then read the data, whereas writing needs to perform the same traversal, then write the data, then update some metadata.

When caching comes into play, things change. Reading data that's in cache is very fast, but reading data that isn't in cache has to go and fetch it from the disk. Operating systems might try to anticipate reads, but that only works in very specific cases (mainly sequential reads from a file). Writing, on the other hand, can be near-instantaneous as long as the amount of data isn't too large, as the data is only written to a memory buffer. The buffer has to be written to disk eventually, but by that time your application has already moved on to do more stuff.

Related Solutions

Bash – Why is deleting files by name painfully slow and also exceptionally fast

rm -r is expected to be slow as its recursive. A depth first traversal has to be made on the directory structure.

Now how did you create 10 million files ? did u use some script which loops on some order ? 1.txt,2.txt,3.txt... if yes then those files may too be allocated on same order in contigous blocks in hdd.so deleting on same order will be faster.

"ls -f" will enable -aU which lists in directory order which is again recursive.

Will using a compressed filesystem over an encrypted volume improve performance

I did a small benchmark. It only tests writes though.

Test data is a Linux kernel source tree (linux-3.8), already unpacked into memory (/dev/shm/ tmpfs), so there should be as little influence as possible from the data source. I used compressible data for this test since compression with non-compressible files is nonsense regardless of encryption.

Using btrfs filesystem on a 4GiB LVM volume, on LUKS [aes, xts-plain, sha256], on RAID-5 over 3 disks with 64kb chunksize. CPU is a Intel E8400 2x3Ghz without AES-NI. Kernel is 3.8.2 x86_64.

The script:

#!/bin/bash

PARTITION="/dev/lvm/btrfs"
MOUNTPOINT="/mnt/btrfs"

umount "$MOUNTPOINT" >& /dev/null

for method in no lzo zlib
do
    for iter in {1..3}
    do
        echo Prepare compress="$method", iter "$iter"
        mkfs.btrfs "$PARTITION" >& /dev/null
        mount -o compress="$method",compress-force="$method" "$PARTITION" "$MOUNTPOINT"
        sync
        time (cp -a /dev/shm/linux-3.8 "$MOUNTPOINT"/linux-3.8 ; umount "$MOUNTPOINT")
        echo Done compress="$method", iter "$iter"
    done
done

So in each iteration, it makes a fresh filesystem, and measures the time it takes to copy the linux kernel source from memory and umount. So it's a pure write-test, zero reads.

The results:

Prepare compress=no, iter 1

real 0m12.790s
user 0m0.127s
sys 0m2.033s
Done compress=no, iter 1
Prepare compress=no, iter 2

real 0m15.314s
user 0m0.132s
sys 0m2.027s
Done compress=no, iter 2
Prepare compress=no, iter 3

real 0m14.764s
user 0m0.130s
sys 0m2.039s
Done compress=no, iter 3
Prepare compress=lzo, iter 1

real 0m11.611s
user 0m0.146s
sys 0m1.890s
Done compress=lzo, iter 1
Prepare compress=lzo, iter 2

real 0m11.764s
user 0m0.127s
sys 0m1.928s
Done compress=lzo, iter 2
Prepare compress=lzo, iter 3

real 0m12.065s
user 0m0.132s
sys 0m1.897s
Done compress=lzo, iter 3
Prepare compress=zlib, iter 1

real 0m16.492s
user 0m0.116s
sys 0m1.886s
Done compress=zlib, iter 1
Prepare compress=zlib, iter 2

real 0m16.937s
user 0m0.144s
sys 0m1.871s
Done compress=zlib, iter 2
Prepare compress=zlib, iter 3

real 0m15.954s
user 0m0.124s
sys 0m1.889s
Done compress=zlib, iter 3

With zlib it's a lot slower, with lzo a bit faster, and in general, not worth the bother (difference is too small for my taste, considering I used easy-to-compress data for this test).

I'd make a read test also but it's more complicated as you have to deal with caching.

Best Answer

Related Solutions

Bash – Why is deleting files by name painfully slow and also exceptionally fast

Will using a compressed filesystem over an encrypted volume improve performance

Related Question