Shell – Find, count and sort all audio files. ALAC (M4A) files

audiofindshell-scriptsort

I'm putting a script together to sort a very large music collection. It's approx 22000 albums, a mixture of FLAC, WAV, AIFF, M4A (AAC & ALAC).

So far, I can sort by file type and get a total size of each type.

ftypes=$(find . -type f | grep -iE ".*\.[a-zA-Z0-9]*$" | sed -e 's/.*\(\.[a-zA-Z0-9]*\)$/\1/' | sort -f | uniq -i)

for ft in $ftypes
do
echo -n "$ft "
find . -name "*${ft}" -print0 | xargs -0 du -hc | grep total | awk '{print $1}'
done

I'd like to edit this to get the number of files by file type as well as the total size.

Now, the M4A files could be either AAC or ALAC, and I'd like to know how many of each.

I can find and print a list of ALAC files with

find . -name \*.m4a | while read file; do avprobe "$file" 2>&1 | grep -q 'Audio: alac' && echo "$file"; done

but I'm lost on how to get the total file count and size, instead of a list of file names, and combine it all into one script.

Basically, I'd like to output:

  • List of file types
  • Number of files by filetype
  • Total size by filetype
  • M4A total number of AAC & total size
  • M4A total number of ALAC & total size

Depending on how well this works, I may consider using this to sort files into directories based on the output.

Best Answer

You should decompose your goal into several steps easier to solve. This will have two advantages:

  • It will be easier to solve,
  • The resulting code will be clearer and more reusable.

The scripts below basically follows these steps:

  1. Generate raw statistic files. An easy way is to append the file size and the file name in a temporary file named after the original file extension. So, if you have the file /path/to/foo.mp3 which is 3000000 large, it will append 3000000 /path/to/foo.mp3 at the end of a temporary file named mp3.
  2. Handle specific cases. Here it will process the temporary file m4a and create the two other files m4a_aac and m4a_alac based on the test you gave in the question.
  3. Generate output. All the needed information being now available, it just has to:
    • Count the number of line in each temporary file to determine the number of file of this type,
    • Sum up each size to get the total size of files of this type.

Here is the script:

#!/bin/sh

# This script takes the searched directory as first parameter.
# For instance: ./this-script.sh ~/Music

: ${1:?"You must pass the search directory as first parameter."}
searchdir="$1"

# Create a temporary directory
statsdir=""
trap 'rm -rf $statsdir' EXIT
statsdir=$(mktemp -d "/tmp/tmp.XXXXXXXXXX") || exit 1

# Generate one listing file per extension
awkscript='/\.[[:alnum:]]+$/ {print $0 >statsdir"/"$(NF)}'
# For Linux: stat -c "%s %n"
# For Mac: stat -f "%z %N"
find "$searchdir" -type f -exec stat -f "%z %N" {} + | \
    awk -F '.' -v statsdir="$statsdir" "$awkscript"

# Distinguish between m4a/AAC and m4a/ALAC
if [ -f "$statsdir/m4a" ]; then
    input="$statsdir/m4a"
    while IFS= read -r line; do
        filename=${line#* }
        if avprobe "$filename" 2>&1 | grep -q 'Audio: alac'; then
            echo "$line" >> "$statsdir/m4a_alac"
        else
            echo "$line" >> "$statsdir/m4a_aac"
        fi
    done < "$input"
    rm "$statsdir/m4a"
fi

# Generate and display result
{
    printf "Type Count Size\n"
    for extension in $(ls "$statsdir"); do
        count=$(wc -l "$statsdir/$extension" | cut -d ' ' -f 1)
        totalsize=$(awk '{s+=$1} END {print s}' "$statsdir/$extension")
        printf "%s %d %d\n" "$extension" "$count" "$totalsize"
    done
} | column -t
Related Question