ImageMagick – Using xargs with pdftk

imagemagickxargs

I am using the following code to concatenate all the pdf files in the current directory:

find . -iname '*.pdf'|sort|xargs|xargs -I {} pdftk {} cat output union.pdf

The first invocation of xargs has the effect of converting the output of sort into a single line, with items separated by a space. But the result is this:

Error: Unable to find file.
Error: Failed to open PDF file: 
   ./001.pdf ./002.pdf ./003.pdf ./004.pdf ./007.pdf ./010.pdf ./031.pdf ./057.pdf ./077.pdf ./103.pdf ./131.pdf ./155.pdf ./179.pdf ./205.pdf ./233.pdf ./261.pdf ./285.pdf ./313.pdf ./331.pdf ./357.pdf ./383.pdf ./411.pdf
Errors encountered.  No output created.
Done.  Input errors, so no output created.

Does xargs pass the argument to pdftk with surrounding quotes? How to prevent this? (Whitespaces, escaping and the way they interact with commands always drive me crazy…)

Best Answer

Does xargs pass the argument to pdftk with surrounding quotes?

Yes and no, but technically no. xargs does no quoting, and pdftk does no unquoting either.

The way programs receive command-line arguments in Linux/Unix isn't by using a single string that needs to be quoted and unquoted – that's just how the user-facing "command shell" language works, and quotes are interpreted by your shell, not by programs themselves. (This is the opposite of how Windows does it.)

Internally programs are started using an array (/list/vector) of strings, which inherently preserves the exact text contents and separation of every element, so it doesn't really use quoting or escaping in the first place. (That is – unless you have to nest it, in which case it's back to string quoting and parsing, as you'll see below...)

For example, your command line is parsed into this (using C-like array syntax for example, but the quotes aren't actually part of the strings):

1. {"find", ".", "-iname", "*.pdf", NULL}
2. {"sort", NULL}
3. {"xargs", NULL}
4. {"xargs", "-I", "{}", "pdftk", "{}", "cat", "output", "union.pdf", NULL}
                         └─xargs uses these elements as the command─┘

So when xargs reads a line of input (because -I sets it to line-by-line mode), it replaces the symbols {} in each individual element with the input line, without rearranging the elements in any way. Then it asks the OS to runs the result:

{"pdftk", "./001.pdf ./002.pdf ./003.pdf …", "cat", "output", "union.pdf", NULL}

So you'll need a different way to achieve this than xargs -I alone.

  • You could, for example, ask xargs to run a shell – which will then interpret/split/unquote the input the same way that you'd expect from a shell:

    find … | sort | xargs | xargs -I {} bash -c "pdftk {} cat output union.pdf"
    

    The element following -c will become pdftk ./001.pdf ./002.pdf … cat output union.pdf and bash will split it into words as expected. (But note that because xargs doesn't do quoting, this will split up filenames that happen to contain spaces, and will give weird results when filenames contain special characters.)

  • You could use the shell's "process substitution" feature:

    pdftk $(find … | sort) cat output union.pdf
    

    This will split the resulting text at any whitespace (just like $var variable expansion). The lines don't need to be joined first. But it will have the same issues with filenames containing spaces, and slightly fewer issues with special characters.

  • Recommended: You could avoid 'find' and 'xargs' entirely and use the interactive shell's built-in wildcard matching directly:

    pdftk *.pdf cat output union.pdf
    

    Ordinary * isn't recursive, but in Bash or zsh you also have ** which is the recursive mode:

    shopt -s globstar                       # enable the feature (only needed in bash)
    
    pdftk **/*.pdf cat output union.pdf
    

    (The match results will always be sorted, at least in shells using the POSIX sh language. And because the shell directly expands each filename to an individual command-line element, there will be no quoting issues at all, even with unusal filenames.)

Related Question