Ubuntu – the difference between find with -exec and xargs

command linefindxargs

trying to learn Bash scripting I want to execute some command on all files below my current directory that satisfy a certain condition. Using

find -name *.flac

Specifically I want to convert .flac to .mp3. I can find all the files. However I do not see the difference in executing a command using either the option -exec for find and using xargs. E.g.

find -name *.flac | xargs -i ffmpeg -i {} {}.mp3

compared to

find -name *.flac -exec ffmpeg -i {} {}.mp3 \;

Can someone point out the difference? What is better praticice? What are the advantages/ disadvantages?

Also: If I wanted to simultaneously delete the original file, how would I add a second command in the above code?

Best Answer

Summary:

Unless you are much more familiar with xargs than -exec, you will probably want to use -exec when you use find.

Since xargs is a separate program, calling it is likely to be marginally less efficient than using -exec, which is a feature of the find program. We don't usually want to call an extra program if it doesn't provide any additional benefit in terms of reliability, performance or readability. Since find ... -exec ... provides the ability to run commands with an argument list (as xargs does) if possible, there is not really any advantage of using xargs with find over -exec. In the case of ffmpeg, we have to specify input and output files, so we can't make performance gains using either method to construct an argument list, and with xargs removing the illogical original filename extension is more difficult.

What xargs does

Note: The verbose flag (which prints the constructed command with its arguments) in xargs is -t, and the interactive flag (which causes the user to be prompted for confirmation to operate on each argument) is -p. You may find both of these useful for understanding and testing its behaviour.

xargs attempts to turn its STDIN (typically the STDOUT of the previous command that has been piped to it) into a list of arguments to some command.

command1 | xargs command2 [output of command1 will be appended here]

Since STDOUT or STDIN is just a stream of text (this is also why you shouldn't parse the output of ls), xargs is easily tripped up. It reads arguments as being delimited by spaces or newlines. Filenames are allowed to contain spaces and may even contain newlines, and such filenames will cause unexpected behaviour. Let's say you have a file called foo bar. When a list containing this filename is piped to xargs, it attempts to run the given command on foo and on bar.

The same problem occurs when you type command foo bar, and you know you can avoid it by quoting the space or the whole name, eg command foo\ bar or command "foo bar", but even if we are able to quote the list passed to xargs we don't usually want to, because we don't want the whole list to be treated as a single argument. The standard solution to this is to use the null character as delimiter, since filenames cannot contain it:

find path test(s) -print0 | xargs -0 command

This causes find to append the null character to each filename instead of a space, and xargs to treat only the null character as delimiter.

Problems may still occur if the command doesn't accept multiple arguments or if the argument list is extremely long.

In this case you are using ffmpeg, which expects input files to be specified first, and output files to be specified last. We can tell ffmpeg which files(s) to use as input explicitly with the -i flag, but we need to give the output filename (from which the format is usually guessed, though we can also specify it) too. So, to construct suitable commands, you need to use the replace string option (-I or -i) of xargs to specify both the input and output files:

... | xargs -I{} command {} {}.out

(the documentation says that -i is deprecated for this purpose and we should use -I instead, but I am not sure why. When using -I, you must specify the replacement ({} is normally used) immediately after the option. With -i you can omit to specify the replacement, but {} is understood by default.)

The -I option causes the command list to be split only on newlines, not spaces, so if you are sure your filenames will not contain newlines, you do not have to use -print0 | xargs -0 when you use -I. If you are uncertain, you can still use the safer syntax:

find -name "*.flac" -print0 | xargs -0I{} ffmpeg -i {} {}.mp3

However, the performance benefit of xargs (which enables us to run a command once with a list of arguments) is lost here, since ffmpeg must be run once for each pair of input and output files (you can see this easily by prepending echo to ffmpeg to test the above command). This also produces an illogical filename and doesn't allow you to run multiple commands. To do the latter, you can call bash, as in dessert's answer:

... | xargs -I{} bash -c 'ffmpeg -i {} {}.mp3 && rm {}'

but renaming is tricky.

How -exec is different

When you use the -exec option to find, the found files are passed as arguments to the command after -exec. They aren't turned into text. With the syntax:

find ... -exec command {} \;

command is run once for each file found. With the syntax

find ... -exec command {} +

an argument list is constructed from the found files so that we can run the command only once (or only as many times as required) on multiple files, giving the performance benefit provided by xargs. However, since the filename arguments aren't constructed from a stream of text, using -exec doesn't have the problem xargs has of breaking on spaces and other special characters.

With ffmpeg, we can't use + for the same reason as xargs didn't give any performance benefit; since we need to specify both input and output, the command must be run on each file individually. We have to use some form of

find -name "*.flac" -exec ffmpeg -i {} {}.out \;

This, again, will give you a rather illogically named file, as dessert's answer explains, so you may want to strip it, as dessert's answer explains how to do with string manipulation (not easily done in xargs; another reason to use -exec). It also explains how to run multiple commands on the file so that you can safely remove the original file after a successful conversion.

Instead of repeating dessert's recommendation, which I agree with, I will suggest an alternative to find, which allows similar flexibility to running bash -c after -exec; a bash for loop:

shopt -s globstar           # allow recursive globbing with **
for f in ./**/*.flac; do    # for all files ending with .flac
   # convert them, stripping the original extension from the new filename
   echo ffmpeg -i "$f" "${f%.flac}.mp3" &&
   echo rm -v "$f"          # if that succeeded, delete the original
done
shopt -u globstar           # turn recursive globbing off

Remove the echoes after testing to actually operate on the files.

ffmpeg doesn't recognise -- to mark the end of options, so to avoid filenames beginning with - being interpreted as options, we use ./ to indicate the current directory instead of starting with **, so that all paths begin with ./ instead of arbitrary filenames. This means we don't need to use -- with rm (which does recognise it) either.


Note: you should quote your -name test expression if it contains any wildcard characters, otherwise the shell will expand them if possible (ie if they match any files in the current directory) before they are passed to find, so in the first place, use

find -name "*.flac"

to prevent unexpected behaviour.