Find – Why GNU Find is Faster Than Graphical File Search Utilities

dolphinfindperformance

I'm trying to find a file that doesn't exist in my home directory and all subdirectories.

find ~/ -name "bogus" gives me that information after few seconds, yet KDE's dolphin file manager needed almost 3 minutes to do the same. This corresponds with my previous experience with GNOME beagle.

How does find manage to do the same very fast while graphical search (which is more intuitive to use than commandline parameters) slugs behind?

Best Answer

Looking at Dolphin with Baloo specifically, it seems to look up the metadata of every file in its search domain, even if you're doing a simple file name search. When I trace the file.so process, I see calls to lstat, getxattr and getxattr again for every file, and even for .. entries. These system calls retrieve metadata about the file which is stored in a different location from the file name (the file name is stored in the directory contents, but the metadata are in the inode). Querying the metadata of a file multiple times is cheap since the data would be in the disk cache, but there can be a significant difference between querying the metadata and not querying the metadata.

find is much more clever. It tries to avoid unnecessary system calls. It won't call getxattr because it doesn't search based on extended attributes. When it's traversing a directory, it may need to call lstat on non-matching file names because that may be a subdirectory to search recursively (lstat is the system call that returns file metadata including the file type such as regular/directory/symlink/…). However find has an optimization: it knows how many subdirectories a directory has from its link count, and it stops calling lstat once it knows that it's traversed all the subdirectories. In particular, in a leaf directory (a directory with no subdirectories), find only checks the names, not the metadata. Furthermore some filesystems keep a copy of the file type in the directory entry so that find doesn't even need to call lstat if that's the only information it needs.

If you run find with options that require checking the metadata, it'll make more lstat calls, but it still won't make an lstat call on a file if it doesn't need the information (for example because the file is excluded by a previous condition matching on the name).

I suspect that other GUI search tools that reinvent the find wheel are similarly less clever than the command line utility which has undergone decades of optimization. Dolphin, at least, is clever enough to use the locate database if you search “everywhere” (with the limitation which isn't clear in the UI that the results may be out of date).

Related Question