Using ffmpeg to split an Audible audio-book into chapters

audioffmpeg

I've been following this answer to use ffmpeg to convert and play some of my Audible audio-books in LinuxMint. Each book is a single source-file, but I've noticed that ffmpeg lists all the chapters at the start of conversion.

Is there a way to get ffmpeg to split the the book into chapters – convert each chapter into separate files (split by chapters)? Preferably by ffmpeg alone, but using other programs/scripts (together with ffmpeg) is also an option…

(I've seen a few other answers about splitting DVDs into chunks of even lengths or into chapters (using ffmpeg and a python-script), but that's not quite what I'm after, so I'm hoping it was a simpler way of doing it…)

Best Answer

I've been doing exactly this myself recently: as Nemo commented above - ffprobe gives you a json file with the chapter start and ends easily using the command...

ffprobe -i fileName -print_format json -show_chapters

If you add -sexagesimal to the command it creates a slightly more human readable output IMO and the output can be redirected to a file for later processing.

FFmpeg needs a little help so I also used jg and AtomicParsley - the former to parse the JSON file, the latter to add images and metadata to the resultant m4b file.

The script also supports outputting with an m4a file, or conversion to mp3 as required - simple call it with the parameters $1 - input file and (optionally) $2 output type - defaults to m4b.

Using that as a basis I created the following script...

#!/bin/bash

# script to convert m4b (audiobook) files with embedded chapted (for eg. converted from Audbile) into individual chapter files

# required: ffmpeg; jg (json interpreter) & AtomicParsley (to embed pictures and add additional metadata to m4a/m4b AAC files)

# discover the file type (extension) of the input file
ext=${1##*.}
echo "extension: $ext"
# all files / folders are named based on the "shortname" of the input file
shortname=$(basename "$1" ".$ext")
picture=$shortname.jpg
chapterdata=$shortname.dat
metadata=$shortname.tmp
echo "shortname: $shortname"

# if an output type has been given on the command line, set parameters (used in ffmpeg command later)
if [[ $2 = "mp3" ]]; then
  outputtype="mp3"
  codec="libmp3lame"
elif [[ $2 = "m4a" ]]; then
  outputtype="m4a"
  codec="copy"
else
  outputtype="m4b"
  codec="copy"
fi
echo "outputtype: |$outputtype|"

# if it doesn't already exist, create a json file containing the chapter breaks (you can edit this file if you want chapters to be named rather than simply "Chapter 1", etc that Audible use)
[ ! -e "$chapterdata" ] && ffprobe -loglevel error \
            -i "$1" -print_format json -show_chapters -loglevel error -sexagesimal \
            >"$chapterdata"
read -p "Now edit the file $chapterdata if required. Press ENTER to continue."
# comment out above if you don't want the script to pause!

# read the chapters into arrays for later processing
readarray -t id <<< $(jq -r '.chapters[].id' "$chapterdata")
readarray -t start <<< $(jq -r '.chapters[].start_time' "$chapterdata")
readarray -t end <<< $(jq -r '.chapters[].end_time' "$chapterdata")
readarray -t title <<< $(jq -r '.chapters[].tags.title' "$chapterdata")

# create a ffmpeg metadata file to extract addition metadata lost in splitting files - deleted afterwards
ffmpeg -loglevel error -i "$1" -f ffmetadata "$metadata"
artist_sort=$(sed 's/.*=\(.*\)/\1/' <<<$(cat "$metadata" |grep -m 1 ^sort_artist))
album_sort=$(sed 's/.*=\(.*\)/\1/' <<<$(cat "$metadata" |grep -m 1 ^sort_album))
rm "$metadata"

# create directory for the output
mkdir -p "$shortname"
echo -e "\fID\tStart Time\tEnd Time\tTitle\t\tFilename"
for i in ${!id[@]}; do
  let trackno=$i+1
  # set the name for output - currently in format <bookname>/<tranck number>
  outname="$shortname/$(printf "%02d" $trackno). $shortname - ${title[$i]}.$outputtype"
  #outname=$(sed -e 's/[^A-Za-z0-9._- ]/_/g' <<< $outname)
  outname=$(sed 's/:/_/g' <<< $outname)
  echo -e "${id[$i]}\t${start[$i]}\t${end[$i]}\t${title[$i]}\n\t\t$(basename "$outname")"
  ffmpeg -loglevel error -i "$1" -vn -c $codec \
            -ss ${start[$i]} -to ${end[$i]} \
            -metadata title="${title[$i]}" \
            -metadata track=$trackno \
            -map_metadata 0 -id3v2_version 3 \
            "$outname"
  [[ $outputtype == m4* ]] && AtomicParsley "$outname" \
            --artwork "$picture" --overWrite \
            --sortOrder artist "$artist_sort" \
            --sortOrder album "$album_sort" \
            > /dev/null
done

If desired you can edit the JSON file (.dat file) as Audible files just name the chapters "Chapter 1", "Chapter 2", etc.

for eg. initially the first part of the file might read...

{
    "chapters": [
        {
            "id": 0,
            "time_base": "1/1000",
            "start": 0,
            "start_time": "0:00:00.000000",
            "end": 3206908,
            "end_time": "0:53:26.908000",
            "tags": {
                "title": "Chapter 1"
            }
        },

By simply changing the relevant line to... "title": "Introduction" will change the resultant split file.

Related Question