How to use sox or ffmpeg to detect silence intervals in a long audio file and replace them by zeros (aka suppress background noise)

audioffmpegsox

I have a long audio file that was created by concatenating many short files. I would like to detect silence between the speech segments (just a threshold is enough for my purposes) and replace them by absolute zeros such that there is no background "noise". It is important for me to retain the length of the recording.

I know that sox can detect silence at the beginning and end of a file and I can use silence, reverse, pad etc. to remove the samples and fill in the zeros. Is there a way to do it everywhere in the file, not just start+end?

UPD: this is probably a pretty complicated way to ask if there are tools for voice activity detection for Linux

Best Answer

Use sox silence option:

sox [input] [output] silence 1 1 2% -1 0.5 2%

will trim silence at front to 1 second and reduce gaps to half a second in the file. 2% in my case ignores noise floor. 0% might work for you.

-1 tells sox to deal to each instance.