Linux – How to find invalid images

findimagemagicklinuxsearch

I have a directory with sub-directories. In the directories, there are a lot of images, crawled from the web.

How do I loop through every file and show those files which are not valid image files?

It should not be based on file extension.

I came up with this script:

find . -name '*.jpg' -o -name '*.jpeg' -o -name '*.gif' -o -name '*.png' | while read FILE; do
    if ! identify "$FILE" &> /dev/null; then
         echo "$FILE"
    fi  
done

But this is not working, because it outputs valide images, too.

Best Answer

find . -type f \
       \( -name '*.jpg' -o -name '*.jpeg' -o -name '*.gif' -o -name '*.png' \) \
       -exec sh -c '! file -b --mime-type "$1" | grep -q "^image/"' sh {} \; \
       -print

My approach uses -exec to perform a custom test on files. A shell is needed to construct a pipe. A separate shell is run for every file with the right extension, therefore the solution performs rather poorly.

The shell runs file -b --mime-type, then grep checks if the result begins with image/. ! at the beginning of the pipe negates its exit status, so the entire -exec test succeeds iff the file is not really an image. The path is then printed.

Notes:

  1. Omit -name tests to check all files.
  2. Or you may want to use -iname instead of -name.
  3. -iname is not required by POSIX though. Neither is -b nor --mime-type option of file.
  4. The following yields a slightly different output and it's faster:

    find . -type f \
           \( -name '*.jpg' -o -name '*.jpeg' -o -name '*.gif' -o -name '*.png' \) \
           -exec file --mime-type {} + \
    | grep -v "\bimage/"
    

    but some filenames (e.g. with newlines) or paths (with image/) will break the logic.

Related Question