Command-Line – Get List of Subdirectories Containing a File with Specific String

command linefind

How can I get a list of the subdirectories which contain a file whose name matches a particular pattern?

More specifically, I am looking for directories which contain a file with the letter 'f' somewhere occurring in the file name.

Ideally, the list would not have duplicates and only contain the path without the filename.

Best Answer

find . -type f -name '*f*' | sed -r 's|/[^/]+$||' |sort |uniq

The above finds all files below the current directory (.) that are regular files (-type f) and have f somewhere in their name (-name '*f*'). Next, sed removes the file name, leaving just the directory name. Then, the list of directories is sorted (sort) and duplicates removed (uniq).

The sed command consists of a single substitute. It looks for matches to the regular expression /[^/]+$ and replaces anything matching that with nothing. The dollar sign means the end of the line. [^/]+' means one or more characters that are not slashes. Thus, /[^/]+$ means all characters from the final slash to the end of the line. In other words, this matches the file name at the end of the full path. Thus, the sed command removes the file name, leaving unchanged the name of directory that the file was in.

Simplifications

Many modern sort commands support a -u flag which makes uniq unnecessary. For GNU sed:

find . -type f -name '*f*' | sed -r 's|/[^/]+$||' |sort -u

And, for MacOS sed:

find . -type f -name '*f*' | sed -E 's|/[^/]+$||' |sort -u

Also, if your find command supports it, it is possible to have find print the directory names directly. This avoids the need for sed:

find . -type f -name '*f*' -printf '%h\n' | sort -u

More robust version (Requires GNU tools)

The above versions will be confused by file names that include newlines. A more robust solution is to do the sorting on NUL-terminated strings:

find . -type f -name '*f*' -printf '%h\0' | sort -zu | sed -z 's/$/\n/'

Related Solutions

Where does the pattern occur in a match by find

The pattern given to -name has to match the entire base filename. The behaviour of the -name pattern is defined as:

The primary shall evaluate as true if the basename of the current pathname matches pattern

This means it's true when the whole of the basename matches the pattern you gave. You can think of a pattern as being basically like a shell glob: you can use *, ?, and [...] patterns inside it, with the start and end of the pattern aligned with the start and end of the string.

So your command:

find ~ -name bookmarks

finds files named "bookmarks" because that is the entire filename, but:

find ~ -name bookmark

would only find files named 'bookmark', because there are no wildcard characters in the pattern.

To match files called both bookmark and bookmarks, you could use:

find ~ -name 'bookmark*'

So if you want to find

those files whose names contain bookmark regardless the position of bookmark in the filename

you can use use:

find ~ -name '*bookmark*'

to match files whose names have any number of characters, then bookmark, then any number of characters.

Best Answer

Simplifications

More robust version (Requires GNU tools)

Related Solutions

Where does the pattern occur in a match by find

Related Question