Shell – Properly escaping output from pipe in xargs

lsquotingshellxargs

Example:

% touch -- safe-name -name-with-dash-prefix "name with space" \
    'name-with-double-quote"' "name-with-single-quote'" \
    'name-with-backslash\'

xargs can't seem to handle double quotes:

% ls | xargs ls -l 
xargs: unmatched double quote; by default quotes are special to xargs unless you use the -0 option
ls: invalid option -- 'e'
Try 'ls --help' for more information.

If we use the -0 option, it has trouble with name that has dash prefix:

% ls -- * | xargs -0 -- ls -l --
ls: invalid option -- 'e'
Try 'ls --help' for more information.

This is before using other potentially problematic characters like newline, control character, etc.

Best Answer

The POSIX specification does give you an example for that:

ls | sed -e 's/"/"\\""/g' -e 's/.*/"&"/' | xargs -E '' printf '<%s>\n'

(with filenames being arbitrary sequences of bytes (other than / and NULL) and sed/xargs expecting text, you'd also need to fix the locale to C (where all non-NUL bytes would make valid characters) to make that reliable (except for xargs implementations that have a very low limit on the maximum length of an argument))

The -E '' is needed for some xargs implementations that without it, would understand a _ argument to signify the end of input (where echo a _ b | xargs outputs a only for instance).

With GNU xargs, you can use:

ls | xargs -d '\n' printf '<%s>\n'

GNU xargs has also a -0 that has been copied by a few other implementations, so:

ls | tr '\n' '\0' | xargs -0 printf '<%s>\n'

is slightly more portable.

All of those assume the file names don't contain newline characters. If there may be filenames with newline characters, the output of ls is simply not post-processable. If you get:

a
b

That can be either two a and b files or a file called a<newline>b, there's no way to tell.

GNU ls has a --quoting-style=shell-always which makes its output unambiguous and could be post-processable, but the quoting is not compatible with the quoting expected by xargs. xargs recognise "...", \x and '...' forms of quoting. But both "..." and '...' are strong quotes and can't contain newline characters (only \ can escape newline characters for xargs), so that's not compatible with sh quoting where only '...' are strong quotes (and can contain newline characters) but \<newline> is a line-continuation (is removed) instead of an escaped newline.

You can use the shell to parse that output and then output it in a format expected by xargs:

eval "files=($(ls --quoting-style=shell-always))"
[ "${#files[#]}" -eq 0 ] || printf '%s\0' "${files[@]}" |
  xargs -0 printf '<%s>\n'

Examples that "work"

These code snippets will produce the desired output.

The paste command:

$ paste -s -d ',' k.txt 
1,2,3

The sed command:

$ sed ':a;N;$!ba;s/\n/,/g' k.txt
1,2,3

$ sed ':a;{N;s/\n/,/};ba' k.txt 
1,2,3

The perl command:

$ perl -00 -p -e 's/\n(?!$)/,/g' k.txt
1,2,3

$ perl -00 -p -e 'chomp;tr/\n/,/' k.txt
1,2,3

The awk command:

$ awk '{printf"%s%s",c,$0;c=","}' k.txt
1,2,3

$ awk '{printf "%s,",$0}' k.txt | awk '{sub(/\,$/,"");print}'
1,2,3

$ awk -vORS=, 1 k.txt | awk '{sub(/\,$/,"");print}'
1,2,3

$ awk 'BEGIN {RS="dn"}{gsub("\n",",");print $0}' k.txt | awk '{sub(/\,$/,"");print}'
1,2,3

The python command:

$ python -c "import sys; print sys.stdin.read().replace('\n', ',')[0:-1]" <k.txt
1,2,3

$ python -c "import sys; print sys.stdin.read().replace('\n', ',').rstrip(',')" <k.txt
1,2,3

Bash's mapfile built-in:

$ mapfile -t a < k.txt; (IFS=','; echo "${a[*]}")
1,2,3

The ruby command:

$ ruby -00 -pe 'gsub /\n/,",";chop' < k.txt
1,2,3

$ ruby -00 -pe '$_.chomp!"\n";$_.tr!"\n",","' k.txt
1,2,3

The php command:

$ php -r 'echo strtr(chop(file_get_contents($argv[1])),"\n",",");' k.txt
1,2,3

Caveats

Most of the examples above will work just fine. Some have hidden issues, such as the PHP example above. The function chop() is actually an alias to rtrim(), so the last line's trailing spaces will also be removed.

So too do does the first Ruby example, and the first Python example. The issue is with how they're all making use of a type of operation that essentially "chops" off, blindly, a trailing character. This is fine in for the example that the OP provided, but care must be taken when using these types of one liners to make sure that they conform with the data they're processing.

Example

Say our sample file, k.txt looked like this instead:

$ echo -en "1\n2\n3" > k.txt

It looks similar but it has one slight difference. It doesn't have a trailing newline (\n) like the original file. Now when we run the first Python example we get this:

$ python -c "import sys; print sys.stdin.read().replace('\n', ',')[0:-1]" <k.txt
1,2,

Examples that "almost" work

These are the "always a bridesmaid, never a bride" examples. Most of them could probably be adapted, but when working a potential solution to a problem, when it feels "forced", it's probably the wrong tool for the job!

The perl command:

$ perl -p -e 's/\n/,/' k.txt
1,2,3,

The tr command:

$ tr '\n' ','  < k.txt 
1,2,3,

The cat + echo commands:

$ echo $(cat k.txt)
1 2 3

The ruby command:

$ ruby -pe '$_["\n"]=","' k.txt
1,2,3,

Bash's while + read built-ins:

$ while read line; do echo -n "$line,"; done < k.txt
1,2,3,

Why Not to Parse ls Command and What to Use Instead

I am not at all convinced of this, but let's suppose for the sake of argument that you could, if you're prepared to put in enough effort, parse the output of ls reliably, even in the face of an "adversary" — someone who knows the code you wrote and is deliberately choosing filenames designed to break it.

Even if you could do that, it would still be a bad idea.

Bourne shell is not a good language. It should not be used for anything complicated, unless extreme portability is more important than any other factor (e.g. autoconf).

I claim that if you're faced with a problem where parsing the output of ls seems like the path of least resistance for a shell script, that's a strong indication that whatever you are doing is too complicated for shell and you should rewrite the entire thing in Perl or Python. Here's your last program in Python:

import os, sys
for subdir, dirs, files in os.walk("."):
    for f in dirs + files:
      ino = os.lstat(os.path.join(subdir, f)).st_ino
      sys.stdout.write("%d %s %s\n" % (ino, subdir, f))

This has no issues whatsoever with unusual characters in filenames -- the output is ambiguous in the same way the output of ls is ambiguous, but that wouldn't matter in a "real" program (as opposed to a demo like this), which would use the result of os.path.join(subdir, f) directly.

Equally important, and in stark contrast to the thing you wrote, it will still make sense six months from now, and it will be easy to modify when you need it to do something slightly different. By way of illustration, suppose you discover a need to exclude dotfiles and editor backups, and to process everything in alphabetical order by basename:

import os, sys
filelist = []
for subdir, dirs, files in os.walk("."):
    for f in dirs + files:
        if f[0] == '.' or f[-1] == '~': continue
        lstat = os.lstat(os.path.join(subdir, f))
        filelist.append((f, subdir, lstat.st_ino))

filelist.sort(key = lambda x: x[0])
for f, subdir, ino in filelist: 
   sys.stdout.write("%d %s %s\n" % (ino, subdir, f))

Best Answer

Related Solutions

How to Format Output of Xargs Command

Examples that "work"

Examples that "almost" work

Why Not to Parse ls Command and What to Use Instead

Related Question