Automating the choice between JPEG and PNG with a script

conversionimagemagickjpegpngunix

Choosing the right format to save your images in is crucial for preserving image quality and reducing artifacts. Different formats follow different compression methods and come with their own set of advantages and disadvantages.

JPG, for instance is suited for real life photographs that are rich in color gradients. The lossless PNG, on the other hand, is far superior when it comes to schematic figures:

enter image description here

Picking the right format can be a chore when working with a large number of files. That's why I would love to find a way to automate it.


A little bit of background on my particular use case:

I am working on a number of handouts for a series of lectures at my unversity. The handouts are rich in figures, which I have to extract from PDF-formatted slides. Extracting these images gives me lossless PNGs, which are needlessly large at times.

Converting these particular files to JPEG can reduce their size to up to less than 20% of their original file size, while maintaining the same quality. This is important as working with hundreds of large images in word processors is pretty crash-prone.

Batch converting all extracted PNGs to JPEGs is not an option I am willing to follow, as many if not most images are better suited to be formatted as PNGs. Converting these would result in insignificant size reductions and sometimes even increases in filesize – that's at least what my test runs showed.


What we can take from this is that file size after compression can serve as an indicator on what format is suited best for a particular image. It's not a particularly accurate predictor, but works well enough. So why not use it in form of a script:

enter image description here

I included inotifywait because I would prefer for the script be executed automatically as soon as I drag an extracted image into a folder.

This is a simpler version of the script that I've been using for the last couple of weeks:

#!/bin/bash
inotifywait -m --format "%w%f" --exclude '.jpg' -r -e create -e moved_to --fromfile '/home/MHC/.scripts/Workflow/Conversion/include_inotifywait' | while read file; do mogrify -format jpg -quality 92 "$file"
done

The advanced version of the script would have to

  • be able to handle spaces in file names and directory names
  • preserve the original file names
  • flatten PNG images if an alpha value is set
  • compare the file size between the temporary converted image and its original
  • determine if the difference is greater than a given precentage
  • act accordingly

The actual conversion could be done with imagemagick tools:

convert -quality 92 -flatten -background white file.png file.jpg

Unfortunately, my bash skills aren't even close to advanced enough to convert the scheme above into an actual script, but I am sure many of you can.

My reputation points on here are pretty low, but I will gladly award the most helpful answer with the highest bounty I can set.

References: http://www.formortals.com/introducing-cnb-imageguide/,
http://www.turnkeylinux.org/blog/png-vs-jpg

Edit: Also see my comments below for some more information on why I think this script would be the best solution to the problem I am facing.

Best Answer

Edit: Fixed some issues with the original script. Added an alternative one based on Marcks Thomas' proposition.

Edit 2: Updated cutoff values based on a number of test runs. I am still not sure how to estimate file sizes for greyscale images. If you are working with a large number of images outside of RGB colour schemes you might want to implement the first script as a fallback mode to the second one.

Edit 3: Added optipng integration. This optimizes PNG file sizes without any quality loss. See here for more information. Some smaller improvements.


Version 0.1

Important note: This script is deprecated. Newer versions are far more efficient.

Alright, my question might have been slightly too localized, so I put some time into it and compiled the script myself:

#!/bin/bash

# AUTHOR:   (c) MHC (http://askubuntu.com/users/81372/mhc)
# NAME:     Intelliconvert 0.1
# DESCRIPTION:  A script to automate and optimize the choice between different image formats.
# LICENSE:  GNU GPL v3 (http://www.gnu.org/licenses/gpl.html)
# REQUIREMENTS:  Imagemagick

ORIGINAL="$1"

###Filetype check###

MIME=$(file -ib "$ORIGINAL")

if [ "$MIME" = "image/png; charset=binary" ]
  then
    echo "PNG Mode"

###Variables###

      ##Original Image##
    FILENAME=$(basename "$ORIGINAL")
    PARENTDIR=$(dirname "$ORIGINAL")
        SUBFOLDER=$(echo "$PARENTDIR" | cut -d"/" -f10-)
    ORIGARCHIVE="~/ORIG"

      ##Converted Image##
    TEMPDIR="/tmp/imgcomp"
    CONVERTED="$TEMPDIR/$FILENAME.jpg"

      ##Image comparison##
    DIFFLO="50"
    DIFFHI="75"
    CUTOFF="1000000"

      ##DEBUG
    echo "#### SETTINGS ####"
    echo "Filepath to original = $ORIGINAL"
    echo "Filename= $FILENAME"
    echo "Parent directory = $PARENTDIR"
    echo "Archive directory = $ORIGARCHIVE"
    echo "Temporary directory = $TEMPDIR"
    echo "Filepath to converted image = $CONVERTED"
    echo "Low cut-off = $DIFFLO"
    echo "High cut-off = $DIFFHI"

###Conversion###

    convert -quality 92 -flatten -background white "$ORIGINAL" "$CONVERTED"

###Comparison###

    F1=$(stat -c%s "$ORIGINAL" )
    F2=$(stat -c%s "$CONVERTED" )
    FQ=$(echo "($F2*100/$F1)" | bc)

      #Depending on filesize we use a different Cut-off#
    if [ "$F1" -ge "$CUTOFF" ]
      then
        DIFF="$DIFFHI"
      else  
        DIFF="$DIFFLO"
    fi

      ##DEBUG
    echo "### COMPARISON ###"
    echo "Filesize original = $F1 Bytes"
    echo "Filesize converted = $F2 Bytes"
    echo "Chosen cut-off = $DIFF %"
    echo "Actual Ratio = $FQ %"


    if [ "$FQ" -le "$DIFF" ]
      then
           echo "JPEG is more efficient, converting..."
           mv -v "$CONVERTED" "$PARENTDIR"
               mkdir -p "$ORIGARCHIVE/$SUBFOLDER"
           mv -v "$ORIGINAL" "$ORIGARCHIVE/$SUBFOLDER"
      else
           echo "PNG is fine, exiting."
           rm -v "$CONVERTED"
    fi


  else
    echo "File does not exist or unknown MIME type, exiting."

fi

The script works great in combination with Watcher.

This is my first proper script, so there might be some unresolved bugs and issues I just didn't see. Feel free to use it for yourself and improve it. If you do so, I'd appreciate it if you could leave a comment here, so that I can learn from it.


Version 0.2.1

A more efficient way of finding the right format can be achieved by comparing the original's file size to its estimated size as an uncompressed image:

#!/bin/bash

# AUTHOR:   (c) MHC (http://askubuntu.com/users/81372/mhc)
# NAME:     Intelliconvert 0.2.1
# DESCRIPTION:  A script to automate and optimize the choice between different image formats.
# LICENSE:  GNU GPL v3 (http://www.gnu.org/licenses/gpl.html)
# REQUIREMENTS:  Imagemagick, Optipng

################ Filetype Check#################

MIME=$(file -ib "$1")

if [ "$MIME" = "image/png; charset=binary" ]
  then
    echo "###PNG Mode###"

####################Settings####################

##Folders##
ORIGARCHIVE="~/ORIG"

##Comparison##
DIFFLO="25"
DIFFHI="20"
CUTOFF="1000000"

################################################

###Variables###

ORIGINAL="$1"
FILENAME=$(basename "$ORIGINAL")
PARENTDIR=$(dirname "$ORIGINAL")
SUBFOLDER=$(echo "$PARENTDIR" | cut -d"/" -f10-)
CONVERTED="$PARENTDIR/$FILENAME.jpg"

#DEBUG#
    echo "###SETTINGS###"
    echo "Filepath to original = $ORIGINAL"
    echo "Filename= $FILENAME"
    echo "Parent directory = $PARENTDIR"
    echo "Archive directory = $ORIGARCHIVE"
    echo "Filepath to converted image = $CONVERTED"
    echo "Low cut-off = $DIFFLO"
    echo "High cut-off = $DIFFHI"


###Image data###

        WIDTH=$(identify -format "%w" "$ORIGINAL")
        HEIGHT=$(identify -format "%h" "$ORIGINAL")
        ZBIT=$(identify -format "%z" "$ORIGINAL")
        COL=$(identify -format "%[colorspace]" "$ORIGINAL")
        F1=$(stat -c%s "$ORIGINAL")

        if [ "$COL" = "RGB" ]
          then
              CHANN="3"
          else
              CHANN="1"
        fi


###Cutoff setting###

    if [ "$F1" -ge "$CUTOFF" ]
      then
        DIFF="$DIFFHI"
      else  
        DIFF="$DIFFLO"
    fi


###Calculations on uncompressed image###

        BMPSIZE=$(echo "($WIDTH*$HEIGHT*$ZBIT*$CHANN/8)" | bc)
        FR=$(echo "($F1*100/$BMPSIZE)" | bc)

#DEBUG#

        echo "###IMAGE DATA###"
        echo "Image Dimensions = $WIDTH x $HEIGHT"
        echo "Colour Depth = $ZBIT"
        echo "Colour Profile = $COL"
        echo "Channels = $CHANN"
        echo "Estimated uncompressed size = $BMPSIZE"
        echo "Actual file size = $F1"
        echo "Estimated size ratio = $FR %"
        echo "Cutoff at $DIFF %"

###Backup###

        echo "###BACKUP###"
        mkdir -p "$ORIGARCHIVE/$SUBFOLDER"  #keep the original folder structure
        cp -v "$ORIGINAL" "$ORIGARCHIVE/$SUBFOLDER"
        echo ""

###Comparison###

    if [ "$FR" -ge "$DIFF" ]
      then
          echo "JPEG is more efficient, converting..."
          convert -quality 92 -flatten -background white "$ORIGINAL" "$CONVERTED"
              echo "Done."
          echo "Cleaning up..."
          rm -v "$ORIGINAL"
      else
          echo "PNG is fine, passing over to optipng."
              echo "Optimizing..."
              optipng "$ORIGINAL"
              echo "Done."
    fi

################ Filetype Check#################

  else
    echo "File does not exist or unknown MIME type, exiting."

fi

Props to @Marcks Thomas for the great idea.

Related Question