Shell – Recursively list all directories that contain one or more jpg image files

filesfindshell

I'm trying to tidy-up my photos which are, for various historic reasons, scattered all over my system. To enable me to make a start on this task, I've been trying to use the command line to construct a list of all directories that contain one or more jpg files. I'm certain that I don't have to be concerned about looking for other image file formats, but I do have to allow for jpg appearing in upper and lower case.

I'd like each directory name to appear only once in the final list. To provide an example, if I have the following directories each of which contain one or more jpg or JPG files….

~Mike/Pictures
~Mike/Pictures/London/Olympics
~Mike/Pictures/London
~Mike/Pictures/London/Holiday
~Mike/Photos
~Mike/Family History/Swaine

I'd like the results to appear with each directory listed only once – irrespective of the number of image files it might contain – preferably sorted and then written to a file

~Mike/Family History/Swaine
~Mike/Photos
~Mike/Pictures
~Mike/Pictures/London
~Mike/Pictures/London/Holiday
~Mike/Pictures/London/Olympics

My command line skills are just not up to this! I can use a lot of the simpler forms of single commands, but once they get complex and/or have to be piped things tend to go wrong.

Best Answer

Assuming JPEG image files have the suffix .jpg:

find "$HOME" -type f -name '*.jpg' \
    -exec sh -c 'for d; do dirname "$d"; done' sh {} + | sort -u -o jpeg_dirs.txt

This relies on you not having funky directory names with newlines in their names.

With GNU find:

find "$HOME" -type f -name '*.jpg' -printf '%h\n' | sort -u -o jpeg_dirs.txt

These find commands will find all JPEG images under your home directory and print the names of the directories where they were found. The sort -u will take this list of directory names, sort it, and remove duplicates. The result will be written to the file jpeg_dirs.txt in the current directory.


Looking back at this in early 2021 (3.3 years later) I cringe a bit because my solution above, albeit not wrong per se, is a bit backwards. It also makes the obvious assumption about "nice filenames" (no newlines).

When you're using find to search for directories, don't search for regular files as I did above; actually search for directories. Once we have the directories, we can look in each of them and see if the is a file matching *.jpg or *.JPG (further filename suffixes are easy to add):

find "$HOME" -type d -exec bash -O nullglob -O dotglob -O extglob -c '
    for dirpath do
        set -- "$dirpath"/*.@(jpg|JPG)
        [[ "$#" -gt 0 ]] && printf "%s\n" "$dirpath"
    done' bash {} +

This peeks into each directory from your home directory down and tries to expand the globbing pattern *.@(jpg|JPG) in each. This pattern, which also could have been written as two separate patterns, *.jpg and *.JPG, matches all the files that we're looking for. If one name matches, we assume that this is a directory that we want to output the name of. This will give false positives for directories that contain only sub directories with these suffixes.

The shell options that we run our internal bash script with allows us to match hidden names (dotglob), allows the globbing pattern to disappear completely if it doesn't match anything rather than remain unexpanded (nullglob), and allows us the use of the ksh-inspired extended globbing pattern @(...|...).

Using the zsh shell:

typeset -U list=(~/**/*.(jpg|JPG)(.DN:h))
print -rC1 $list

This creates an array variable, list, that has the property that it only stores unique elements. It is initialized to the result of expanding a filename globbing pattern. The pattern matches all JPEG image files in or below the home directory, and the :h at the end removes the actual filename from the generated pathnames. The . makes the pattern only match regular files, and D and N acts like dotglob and nullglob in bash.

Related Question