rm -r is expected to be slow as its recursive. A depth first traversal has to be made on the directory structure.
Now how did you create 10 million files ? did u use some script which loops on some order ? 1.txt,2.txt,3.txt... if yes then those files may too be allocated on same order in contigous blocks in hdd.so deleting on same order will be faster.
"ls -f" will enable -aU which lists in directory order which is again recursive.
I did a small benchmark. It only tests writes though.
Test data is a Linux kernel source tree (linux-3.8), already unpacked into memory (/dev/shm/ tmpfs), so there should be as little influence as possible from the data source. I used compressible data for this test since compression with non-compressible files is nonsense regardless of encryption.
Using btrfs filesystem on a 4GiB LVM volume, on LUKS [aes, xts-plain, sha256], on RAID-5 over 3 disks with 64kb chunksize. CPU is a Intel E8400 2x3Ghz without AES-NI. Kernel is 3.8.2 x86_64.
The script:
#!/bin/bash
PARTITION="/dev/lvm/btrfs"
MOUNTPOINT="/mnt/btrfs"
umount "$MOUNTPOINT" >& /dev/null
for method in no lzo zlib
do
for iter in {1..3}
do
echo Prepare compress="$method", iter "$iter"
mkfs.btrfs "$PARTITION" >& /dev/null
mount -o compress="$method",compress-force="$method" "$PARTITION" "$MOUNTPOINT"
sync
time (cp -a /dev/shm/linux-3.8 "$MOUNTPOINT"/linux-3.8 ; umount "$MOUNTPOINT")
echo Done compress="$method", iter "$iter"
done
done
So in each iteration, it makes a fresh filesystem, and measures the time it takes to copy the linux kernel source from memory and umount. So it's a pure write-test, zero reads.
The results:
Prepare compress=no, iter 1
real 0m12.790s
user 0m0.127s
sys 0m2.033s
Done compress=no, iter 1
Prepare compress=no, iter 2
real 0m15.314s
user 0m0.132s
sys 0m2.027s
Done compress=no, iter 2
Prepare compress=no, iter 3
real 0m14.764s
user 0m0.130s
sys 0m2.039s
Done compress=no, iter 3
Prepare compress=lzo, iter 1
real 0m11.611s
user 0m0.146s
sys 0m1.890s
Done compress=lzo, iter 1
Prepare compress=lzo, iter 2
real 0m11.764s
user 0m0.127s
sys 0m1.928s
Done compress=lzo, iter 2
Prepare compress=lzo, iter 3
real 0m12.065s
user 0m0.132s
sys 0m1.897s
Done compress=lzo, iter 3
Prepare compress=zlib, iter 1
real 0m16.492s
user 0m0.116s
sys 0m1.886s
Done compress=zlib, iter 1
Prepare compress=zlib, iter 2
real 0m16.937s
user 0m0.144s
sys 0m1.871s
Done compress=zlib, iter 2
Prepare compress=zlib, iter 3
real 0m15.954s
user 0m0.124s
sys 0m1.889s
Done compress=zlib, iter 3
With zlib
it's a lot slower, with lzo
a bit faster, and in general, not worth the bother (difference is too small for my taste, considering I used easy-to-compress data for this test).
I'd make a read test also but it's more complicated as you have to deal with caching.
Best Answer
It depends. There is no general answer to this question.
In the absence of caching, writing a disk file is usually measurably slower than reading. This has little to do with the operating system and everything to do with the hardware: both hard disks and solid state media read faster than they write. A secondary factor is related to filesystem structure: reading only needs to traverse the directory tree and block list down to the data, then read the data, whereas writing needs to perform the same traversal, then write the data, then update some metadata.
When caching comes into play, things change. Reading data that's in cache is very fast, but reading data that isn't in cache has to go and fetch it from the disk. Operating systems might try to anticipate reads, but that only works in very specific cases (mainly sequential reads from a file). Writing, on the other hand, can be near-instantaneous as long as the amount of data isn't too large, as the data is only written to a memory buffer. The buffer has to be written to disk eventually, but by that time your application has already moved on to do more stuff.