How is updatedb so much faster than find

findlocateupdatedb

How is updatedb so much faster than find?

Here's a timed comparison between updatedb and a find command that does a seemingly similar task.

compare.sh

#!/usr/bin/env bash

cmd="sudo updatedb"
echo $cmd
time eval $cmd

cmd="sudo find / \
    -fstype ext4 \
    -not \( \
        -path '/afs/*' -o \
        -path '/net/*' -o \
        -path '/sfs/*' -o \
        -path '/tmp/*' -o \
        -path '/udev/*' -o \
        -path '/var/cache/*' -o \
        -path '/var/lib/pacman/local/*' -o \
        -path '/var/lock/*' -o \
        -path '/var/run/*' -o \
        -path '/var/spool/*' -o \
        -path '/var/tmp/*' -o \
        -path '/proc/*' \
    \) &>/dev/null"

echo $cmd
time eval $cmd

My /etc/updatedb.conf:

PRUNE_BIND_MOUNTS = "yes"
PRUNEFS = "9p afs anon_inodefs auto autofs bdev binfmt_misc cgroup cifs coda configfs cpuset cramfs debugfs devpts devtmpfs ecryptfs exofs ftpfs fuse fuse.encfs fuse.sshfs fusectl gfs gfs2 hugetlbfs inotifyfs iso9660 jffs2 lustre mqueue ncpfs nfs nfs4 nfsd pipefs proc ramfs rootfs rpc_pipefs securityfs selinuxfs sfs shfs smbfs sockfs sshfs sysfs tmpfs ubifs udf usbfs vboxsf"
PRUNENAMES = ".git .hg .svn"
PRUNEPATHS = "/afs /net /sfs /tmp /udev /var/cache /var/lib/pacman/local /var/lock /var/run /var/spool /var/tmp"

For the find command I just specified ext4 filesystem because that's the only filesystem updatedb should end up looking through. I didn't bother with the file extensions and I don't know how to exclude a bind mount from find but I don't have any. I also added an exclusion for '/proc' which it seems that updatedb ignores. I should have also ignored '/sys'.

If there'd be any difference I'd expect the find command to be a little faster since it's rules are a little simpler and it doesn't have to write to disk. Instead updatedb is much faster.

$ ./compare.sh
sudo updatedb

real    0m0.876s
user    0m0.443s
sys 0m0.273s

sudo find / -fstype ext4 -not \( -path '/afs/*' -o -path '/net/*' -o -path '/sfs/*' -o -path '/tmp/*' -o -path '/udev/*' -o -path '/var/cache/*' -o -path '/var/lib/pacman/local/*' -o -path '/var/lock/*' -o -path '/var/run/*' -o -path '/var/spool/*' -o -path '/var/tmp/*' -o -path '/proc/*' \) &>/dev/null

real    6m23.499s
user    0m14.527s
sys 0m10.993s

What are they doing differently?

Best Answer

See the man page for updatedb, "If the database already exists, its data is reused to avoid rereading directories that have not changed".

Whereas the find command traverses all directories regardless of whether they have changed.