Bash – Why is this find command not returning filenames containing non-ASCII characters only

bashcharacter encodingfilesfindunicode

I'm trying to determine the root cause of why this find command is not working; it shouldn't match the file called this_should_not_match below:

$ > find . -type f -name "*[^ -~]*"
./__º╚t
./this_should_not_match
./__╞_u
./__¡VW
./__▀√Z
./__εè_
./__∙Σ_
./__Σ_9
./__Σhm
./__φY_

My shell is Bash 3.2

Best Answer

Ranges only work reliably and portably in the C locale. In other locales, you get some variation, but generally [x-y] gets you the characters (actually collating elements, it could even match sequences of characters) that sort after x and before y in some sort order which is often obscure and not always the same as sort would use.

In the C locale (see What does “LC_ALL=C” do?), characters are bytes and ranges are based on the code point of the characters (on byte values).

LC_ALL=C find . -type f -name "*[^ -~]*"

on an ASCII-based system (most of them; POSIX doesn't guarantee the C locale to use ASCII charset, but in practice, unless you're on some EBCDIC based special IBM mainframe OS (but then you'd know about it), you'll be using ASCII) would list regular files whose name contains bytes other than those between 32 and 126.

Also note that in a multi-byte character locale (like UTF-8 ones, the norm nowadays), the * may not even match all file names as on some systems, it will fail to match sequences of bytes that don't form valid characters.

Related Solutions

Simple command “find” not working

The syntax of find is not like what you have written, please read the manual page man find to get detailed idea.

For example if you want to find files named index.php on the current directory and all the sub directories under it, you can use:

find . -name index.php -type f

If you want to search for files having names say findex.php, index.phpfoo, index.php you need to use:

find . -name '*index.php*' -type f

* is a glob pattern meaning zero or more characters.

On the other hand if you want to look in the current directory only :

find . -maxdepth 1 -name '*index.php*' -type f

Find not returning expected files

The reason this fails is to do with permissions and the wildcard * character. We can reproduce this on a local file-system like this:

Set up the scenario, a directory tree under /tmp/top:

sudo -s <<'x'
mkdir -p /tmp/top/{a,b}/dir/sub/
touch /tmp/top/{a,b}/dir/sub/file
chown root /tmp/top/?/dir
chmod go= /tmp/top/?/dir
x

Notice that we have no permission as an ordinary user to go below /tmp/top/*/dir:

find /tmp/top/?/dir -type f
find: ‘/tmp/top/a/dir’: Permission denied
find: ‘/tmp/top/b/dir’: Permission denied

Try descending with root privileges from a directory that we cannot reach as an ordinary user:
```
sudo find /tmp/top/*/dir/sub -type f
find: ‘/tmp/top/*/dir/sub’: No such file or directory
```
Remember the evaluation of shell wildcards happens before the command is executed. So what is happening here is that the path containing the wildcard * is expanded. Your ordinary user account cannot verify the existence of sub, and so the entire path cannot be verified. The wildcard remains as an asterisk (the default behaviour when a match fails) and the root privileged find is given the literal path /tmp/top/*/dir/sub to descend. This path does not exist, hence the error.
Try descending with root privileges from a directory that we can reach as an ordinary user:
```
sudo find /tmp/top/*/dir -type f
/tmp/top/a/dir/sub/file
/tmp/top/b/dir/sub/file
```
What happens here is similar, but with more useful consequences. The path /tmp/top/*/dir can be evaluated completely as your ordinary user, resulting in two paths /tmp/top/a/dir and /tmp/top/b/dir. These are passed to the root privileged find, and it can subsequently descend these - through the root-only subdirectory - and list the files it discovers.

In your situation, it is highly likely that the .local directories in your wildcarded path cannot be accessed without root privileges, but the higher level directories are quite accessible. While you specify a path that can be evaluated as your ordinary user account, the find can proceed with the expanded set of paths. As soon as you specify a path that cannot be evaluated in the context of your ordinary user account, the expansion fails and find is passed a path that contains a literal * character. This of course fails to match and the find fails.

To resolve the issue you simply need to defer the evaluation of the path until your command is running as root:

sudo bash -c "find /nfshome/*/.local/ -type f -size +1G -exec ls -lh {} \;"

Best Answer

Related Solutions

Simple command “find” not working

Find not returning expected files

Related Question