What makes fsck so slow on big filesystems

ffsfilesystemsfsckopenbsd

I have over a dozen of filesystems on my OpenBSD server with 12GB DDR3 and several 1.5TB HDDs. All filesystems themselves are generally between 8GB and 64GB in size.

I've noticed that even by following the best practice — of keeping them so small — fsck is still very slow on reboot.

What makes fsck so slow? Raw filesystem size? Total number of inodes (iused + ifree)? Number of used inodes? Something else entirely? Any easy way to improve fsck times even further?

Best Answer

The purpose of running fsck is to find inconsistencies. Doing so means walkig the filesystem to look at each directory entry (directory/file) as well as the data behind it to verify for example that the size in the directory entry matches the actual size of the data. This process has always been slow. In the old days we didn't notice since filesystems were much smaller, contained a smaller amount of files and computers took longer to boot anyway (services were started sequentially). Since speed of rotating disks isn't increasing in the same way capacity does, running a filesystem check during system start is becoming less and less feasible.

That's why many reasonably modern filesystems like ext3, ext4, reiserfs, XFS,... don't do a filesystem check on reboot anymore. Instead they use a journal for bookkeeping. Before a change is written to disk it is written to the journal. Once the change is complete the outstanding transaction is marked as complete in the journal. Should the system die before a transaction is complete the filesystem knows which transactions were underway and can "replay" these transactions to bring the filesystem back into a consistent state. This tends to be much faster than running a filesystem check. Modern filesystems use a ton of clever tricks to reduce the overhead of maintaining the journal - in practice you often don't notice the difference.

The latest generation of filesystems like btrfs, ZFS,... uses copy-on-write techniques which means a transaction that modifies a file or metadata never overwrites existing data. Instead the new data is written to separate blocks. Once the new copy is ready the filesystem atomically switches over to using the new copy. This also effectively prevents the filesystem from becoming inconsistent (plus it has some other advantages).

Consider using a journaling filesystem or a copy-on-write filesystem if you want your system to start quickly.

Related Question