Ubuntu – Split audio into several pieces based on timestamps from a text file with sox or ffmpeg

audioffmpegsoxUbuntu

I looked at the following link: Trim audio file using start and stop times

But this doesn't completely answer my question. My problem is: I have an audio file such as abc.mp3 or abc.wav. I also have a text file containing start and end timestamps:

0.0 1.0 silence  
1.0 5.0 music  
6.0 8.0 speech

I want to split the audio into three parts using Python and sox/ffmpeg, thus resulting in three seperate audio files.

How do I achieve this using either sox or ffmpeg?

Later I want to compute the MFCC corresponding to those portions using librosa.

I have Python 2.7, ffmpeg, and sox on an Ubuntu Linux 16.04 installation.

Best Answer

I've just had a quick go at it, very little in the way of testing so maybe it'll be of help. Below relies on ffmpeg-python, but it wouldn't be a challenge to write with subprocess anyway.

At the moment the time input file is just treated as pairs of times, start and end, and then an output name. Missing names are replaced as linecount.wav

import ffmpeg
from sys import argv

""" split_wav `audio file` `time listing`

    `audio file` is any file known by local FFmpeg
    `time listing` is a file containing multiple lines of format:
        `start time` `end time` output name 

    times can be either MM:SS or S*
"""

_in_file = argv[1]

def make_time(elem):
    # allow user to enter times on CLI
    t = elem.split(':')
    try:
        # will fail if no ':' in time, otherwise add together for total seconds
        return int(t[0]) * 60 + float(t[1])
    except IndexError:
        return float(t[0])

def collect_from_file():
    """user can save times in a file, with start and end time on a line"""

    time_pairs = []
    with open(argv[2]) as in_times:
        for l, line in enumerate(in_times):
            tp = line.split()
            tp[0] = make_time(tp[0])
            tp[1] = make_time(tp[1]) - tp[0]
            # if no name given, append line count
            if len(tp) < 3:
                tp.append(str(l) + '.wav')
            time_pairs.append(tp)
    return time_pairs

def main():
    for i, tp in enumerate(collect_from_file()):
        # open a file, from `ss`, for duration `t`
        stream = ffmpeg.input(_in_file, ss=tp[0], t=tp[1])
        # output to named file
        stream = ffmpeg.output(stream, tp[2])
        # this was to make trial and error easier
        stream = ffmpeg.overwrite_output(stream)

        # and actually run
        ffmpeg.run(stream)

if __name__ == '__main__':
    main()

Related Solutions

Split Video – How to Split Video File into Pieces with FFmpeg

You forgot to use backticks - or better: $( ) subshell - in the seq invocation. This works:

for i in $( seq 50 );
do ffmpeg -i input.mpg -sameq -ss 00:`expr $i \* 2 - 2`:00 -t 00:02:00 output.mpg; done

Another thing is that you probably don't want output.mpg to be overwritten in each run, do you? :) Use $i in the output filename as well.

Apart from that: In bash, you can just use $(( )) or $[ ] instead of expr - it also looks more clear (in my opinion). Also, there is no need for seq - brace expansion is all you need to get a sequence. Here's an example:

for i in {1..50}
 do ffmpeg -i input.mpg -sameq -ss 00:$[ i* 2 - 2 ]:00 -t 00:02:00 output_$i.mpg
done

Another good thing about braces is that you can have leading zeros in the names (very useful for file sorting in the future):

for i in {01..50}
 do ffmpeg -i input.mpg -sameq -ss 00:$[ i* 2 - 2 ]:00 -t 00:02:00 output_$i.mpg
done

Notice as well, that i*2 - 2 can be easily simplified to i*2 if you just change the range:

for i in {00..49}
 do ffmpeg -i input.mpg -sameq -ss 00:$[ i*2 ]:00 -t 00:02:00 output_$i.mpg
done

How to use sox or ffmpeg to detect silence intervals in a long audio file and replace them by zeros (aka suppress background noise)

Use sox silence option:

sox [input] [output] silence 1 1 2% -1 0.5 2%

will trim silence at front to 1 second and reduce gaps to half a second in the file. 2% in my case ignores noise floor. 0% might work for you.

-1 tells sox to deal to each instance.

Best Answer

Related Solutions

Split Video – How to Split Video File into Pieces with FFmpeg

How to use sox or ffmpeg to detect silence intervals in a long audio file and replace them by zeros (aka suppress background noise)

Related Question