Linux – the most high-performance Linux filesystem for storing a lot of small files (HDD, not SSD)

filesystemslinuxperformance

I have a directory tree that contains many small files, and a small number of larger files. The average size of a file is about 1 kilobyte. There are 210158 files and directories in the tree (this number was obtained by running find | wc -l).

A small percentage of files gets added/deleted/rewritten several times per week. This applies to the small files, as well as to the (small number of) larger files.

The filesystems that I tried (ext4, btrfs) have some problems with positioning of files on disk. Over a longer span of time, the physical positions of files on the disk (rotating media, not solid state disk) are becoming more randomly distributed. The negative consequence of this random distribution is that the filesystem is getting slower (such as: 4 times slower than a fresh filesystem).

Is there a Linux filesystem (or a method of filesystem maintenance) that does not suffer from this performance degradation and is able to maintain a stable performance profile on a rotating media? The filesystem may run on Fuse, but it needs to be reliable.

Best Answer

Performance

I wrote a small Benchmark (source), to find out, what file system performs best with hundred thousands of small files:

  • create 300000 files (512B to 1536B) with data from /dev/urandom
  • rewrite 30000 random files and change the size
  • read 30000 sequential files
  • read 30000 random files
  • delete all files

  • sync and drop cache after every step

Results (average time in seconds, lower = better):

Using Linux Kernel version 3.1.7
Btrfs:
    create:    53 s
    rewrite:    6 s
    read sq:    4 s
    read rn:  312 s
    delete:   373 s

ext4:
    create:    46 s
    rewrite:   18 s
    read sq:   29 s
    read rn:  272 s
    delete:    12 s

ReiserFS:
    create:    62 s
    rewrite:  321 s
    read sq:    6 s
    read rn:  246 s
    delete:    41 s

XFS:
    create:    68 s
    rewrite:  430 s
    read sq:   37 s
    read rn:  367 s
    delete:    36 s

Result:
While Ext4 had good overall performance, ReiserFS was extreme fast at reading sequential files. It turned out that XFS is slow with many small files - you should not use it for this use case.

Fragmentation issue

The only way to prevent file systems from distributing files over the drive, is to keep the partition only as big as you really need it, but pay attention not to make the partition too small, to prevent intrafile-fragmenting. Using LVM can be very helpful.

Further reading

The Arch Wiki has some great articles dealing with file system performance:

https://wiki.archlinux.org/index.php/Beginner%27s_Guide#Filesystem_types

https://wiki.archlinux.org/index.php/Maximizing_Performance#Storage_devices

Related Question