MacOS – GNU version of locate – gupdatedb fails with “gfind: failed to read file names from file system”

catalinacommand linemacos

I try to test GNU version of locate command. First, I have to create the database like this :

sudo gupdatedb --prunepaths=/Volumes --output=$HOME/locatedb_gupdatedb

Unfortunately, 1 minute after launching the command, I still get the following error :

gfind: failed to read file names from file system at or below '/': No such file or directory

I don't understand where this error could come from ?

UPDATE 1: I replaced gfind by find commad and correct path of find which is /usr/bin/. Unfortunately, I get embarassing error messages like this when I launch the gupdatedb command like this :

sudo gupdatedb --prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /Volumes /System' --output=$HOME/locatedb_gupdatedb

Here the error messages :

find: /System/Volumes/Data/.Spotlight-V100: No such file or directory
find: /System/Volumes/Data/.PKInstallSandboxManager: No such file or directory
find: /System/Volumes/Data/.PKInstallSandboxManager-SystemSoftware: No such file or directory
find: /System/Volumes/Data/.cleverfiles: No such file or directory
find: /System/Volumes/Data/mnt: No such file or directory
find: /System/Volumes/Data/.DocumentRevisions-V100: No such file or directory
etc ...

I tried to modify into /usr/local/Cellar/findutils/4.7.0/libexec/bin/gupdatedb file, the option :

: ${FINDOPTIONS="2 > /dev/null"}

But the issue is this option is set in front of the command find, not at the end, so it is not correct in the following of the file.

Given the fact there a lot of find commands after in the script, I can't add manually each time the 2 > /dev/null terminal option.

Anyone could see how to suppress all these error messages from find command when I launch a gupdatedb command ?

UPDATE 2: I finally managed to create a database with gupdatedb (GNU version of MacOS updatedb command) by doing :

sudo gupdatedb --prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /System /Volumes' --output=$HOME/locatedb_gupdatedb

The issue now is, when I do a research on a substring of a file or directory, the informations seems to be duplicated in results (sub_string is simply the part of a file or directory name) :

For example, if I do a : glocate -d ~/locatedb_gupdatedb sub_string

Then, I have duplicates results like :

/System/Volumes/Data/Users/fab/sub_string.dat
/Users/fab/sub_string.dat

I don't know how to exclude '/System/Volumes/Data/' from these results : however, I have well specified in --prunepaths option the directory System, why isn't it taken into account in database created by gupdatedb ?

Or maybe I should perform a :

sudo gupdatedb --prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /System/Volumes/Data /Volumes' --output=$HOME/locatedb_gupdatedb

??

Any help is welcome to exclude this directory

/System/Volumes/Data

from indexing database.

UPDATE 3: Here is an example of quickly generating a database with updatedb on Debian 10 Buster. Few modifications of normal using have been done between the timestamps of the 2 commands updatedb.

enter image description here

So I conclude there is a really a difference betweeen GNU/MacOS and GNU/Linux implementation.

Any explanation is welcome.

Best Answer

The problem here is that you are running GNU's updatedb with the macOS's find. gupdatedb is a shell script that expects to be running a GNU-compatible find. In particular, it converts --prunepaths to a GNU-compatible basic regular expression.

--prunepaths='/private/tmp /private/var/folders /private/var/tmp */Backups.backupdb /Volumes /System'

converts to

PRUNEREGEX="\(^/private/tmp$\)\|\(^/private/var/folders$\)\|\(^/private/var/tmp$\)\|\(^*/Backups.backupdb$\)\|\(^/System$\)\|\(^/Volumes$\)"

GNU BRE (basic regular expressions) treat \| as an "alternative" operator:

‘foo\|bar’ matches either ‘foo’ or ‘bar’

This is a GNU extension to the BRE syntax, and the macOS find operator does not accept it. So the whole prune string fails to match anything.

The macOS (and POSIX) BRE does not have an alternative operator, so it is not a super easy fix. The macOS updatedb script converts the alternative into explicit -or operators in the find command. You can modify the GNU script to do that. Or get gfind to work.

By the way, GNU's updatedb script builds a completely new database from scratch on every run, just like macOS's.

Your Debian installation is using mlocate which is a completely different implementation than GNU locate, not as widely ported, and not available on macOS AFAIK. And even though mlocate runs quickly on your Debian installation, that does not mean it runs a lot faster than GNU locate. Both run in under a second building the entire database from scratch on my Debian installation when all the disk metadata is in RAM (which is usually the case).

Apple provides mdfind, which uses the incremental database created by mds, which is largely triggered by file system events. This makes it (theoretically) much more efficient than even mlocate, which still has to traverse the entire directory structure looking for changed directories. The problem is that incremental builds accumulate errors. That is why locate, with its full rebuild every time, is still around and preferred by many.