Below are a dozen or so examples of how you can take a file such as this:
$ cat k.txt
1
2
3
and convert it to this format:
1,2,3
You can use this command to create the above file if you'd like to play along:
$ cat <<EOF > k.txt
1
2
3
EOF
The examples below are split into 2 groups. Ones that "work" and ones that "almost" work. I leave these because often times it's just as valuable to see why something doesn't work, as it is to see why something does.
Most scripting languages that I'm familiar with are represented. Some are represented multiple times, since as with the famous acronym typically referenced in Perl, TIMTOWTDI.
NOTE: You can swap out the comma (,
) in the examples below and replace it with whatever characters you want, i.e. |
.
Examples that "work"
These code snippets will produce the desired output.
The paste
command:
$ paste -s -d ',' k.txt
1,2,3
The sed
command:
$ sed ':a;N;$!ba;s/\n/,/g' k.txt
1,2,3
$ sed ':a;{N;s/\n/,/};ba' k.txt
1,2,3
The perl
command:
$ perl -00 -p -e 's/\n(?!$)/,/g' k.txt
1,2,3
$ perl -00 -p -e 'chomp;tr/\n/,/' k.txt
1,2,3
The awk
command:
$ awk '{printf"%s%s",c,$0;c=","}' k.txt
1,2,3
$ awk '{printf "%s,",$0}' k.txt | awk '{sub(/\,$/,"");print}'
1,2,3
$ awk -vORS=, 1 k.txt | awk '{sub(/\,$/,"");print}'
1,2,3
$ awk 'BEGIN {RS="dn"}{gsub("\n",",");print $0}' k.txt | awk '{sub(/\,$/,"");print}'
1,2,3
The python
command:
$ python -c "import sys; print sys.stdin.read().replace('\n', ',')[0:-1]" <k.txt
1,2,3
$ python -c "import sys; print sys.stdin.read().replace('\n', ',').rstrip(',')" <k.txt
1,2,3
Bash's mapfile
built-in:
$ mapfile -t a < k.txt; (IFS=','; echo "${a[*]}")
1,2,3
The ruby
command:
$ ruby -00 -pe 'gsub /\n/,",";chop' < k.txt
1,2,3
$ ruby -00 -pe '$_.chomp!"\n";$_.tr!"\n",","' k.txt
1,2,3
The php
command:
$ php -r 'echo strtr(chop(file_get_contents($argv[1])),"\n",",");' k.txt
1,2,3
Caveats
Most of the examples above will work just fine. Some have hidden issues, such as the PHP example above. The function chop()
is actually an alias to rtrim()
, so the last line's trailing spaces will also be removed.
So too do does the first Ruby example, and the first Python example. The issue is with how they're all making use of a type of operation that essentially "chops" off, blindly, a trailing character. This is fine in for the example that the OP provided, but care must be taken when using these types of one liners to make sure that they conform with the data they're processing.
Example
Say our sample file, k.txt
looked like this instead:
$ echo -en "1\n2\n3" > k.txt
It looks similar but it has one slight difference. It doesn't have a trailing newline (\n
) like the original file. Now when we run the first Python example we get this:
$ python -c "import sys; print sys.stdin.read().replace('\n', ',')[0:-1]" <k.txt
1,2,
Examples that "almost" work
These are the "always a bridesmaid, never a bride" examples. Most of them could probably be adapted, but when working a potential solution to a problem, when it feels "forced", it's probably the wrong tool for the job!
The perl
command:
$ perl -p -e 's/\n/,/' k.txt
1,2,3,
The tr
command:
$ tr '\n' ',' < k.txt
1,2,3,
The cat
+ echo
commands:
$ echo $(cat k.txt)
1 2 3
The ruby
command:
$ ruby -pe '$_["\n"]=","' k.txt
1,2,3,
Bash's while
+ read
built-ins:
$ while read line; do echo -n "$line,"; done < k.txt
1,2,3,
I am not at all convinced of this, but let's suppose for the sake of argument that you could, if you're prepared to put in enough effort, parse the output of ls
reliably, even in the face of an "adversary" — someone who knows the code you wrote and is deliberately choosing filenames designed to break it.
Even if you could do that, it would still be a bad idea.
Bourne shell is not a good language. It should not be used for anything complicated, unless extreme portability is more important than any other factor (e.g. autoconf
).
I claim that if you're faced with a problem where parsing the output of ls
seems like the path of least resistance for a shell script, that's a strong indication that whatever you are doing is too complicated for shell and you should rewrite the entire thing in Perl or Python. Here's your last program in Python:
import os, sys
for subdir, dirs, files in os.walk("."):
for f in dirs + files:
ino = os.lstat(os.path.join(subdir, f)).st_ino
sys.stdout.write("%d %s %s\n" % (ino, subdir, f))
This has no issues whatsoever with unusual characters in filenames -- the output is ambiguous in the same way the output of ls
is ambiguous, but that wouldn't matter in a "real" program (as opposed to a demo like this), which would use the result of os.path.join(subdir, f)
directly.
Equally important, and in stark contrast to the thing you wrote, it will still make sense six months from now, and it will be easy to modify when you need it to do something slightly different. By way of illustration, suppose you discover a need to exclude dotfiles and editor backups, and to process everything in alphabetical order by basename:
import os, sys
filelist = []
for subdir, dirs, files in os.walk("."):
for f in dirs + files:
if f[0] == '.' or f[-1] == '~': continue
lstat = os.lstat(os.path.join(subdir, f))
filelist.append((f, subdir, lstat.st_ino))
filelist.sort(key = lambda x: x[0])
for f, subdir, ino in filelist:
sys.stdout.write("%d %s %s\n" % (ino, subdir, f))
Best Answer
The POSIX specification does give you an example for that:
(with filenames being arbitrary sequences of bytes (other than
/
and NULL) andsed
/xargs
expecting text, you'd also need to fix the locale to C (where all non-NUL bytes would make valid characters) to make that reliable (except forxargs
implementations that have a very low limit on the maximum length of an argument))The
-E ''
is needed for somexargs
implementations that without it, would understand a_
argument to signify the end of input (whereecho a _ b | xargs
outputsa
only for instance).With GNU
xargs
, you can use:GNU
xargs
has also a-0
that has been copied by a few other implementations, so:is slightly more portable.
All of those assume the file names don't contain newline characters. If there may be filenames with newline characters, the output of
ls
is simply not post-processable. If you get:That can be either two
a
andb
files or a file calleda<newline>b
, there's no way to tell.GNU
ls
has a--quoting-style=shell-always
which makes its output unambiguous and could be post-processable, but the quoting is not compatible with the quoting expected byxargs
.xargs
recognise"..."
,\x
and'...'
forms of quoting. But both"..."
and'...'
are strong quotes and can't contain newline characters (only\
can escape newline characters forxargs
), so that's not compatible with sh quoting where only'...'
are strong quotes (and can contain newline characters) but\<newline>
is a line-continuation (is removed) instead of an escaped newline.You can use the shell to parse that output and then output it in a format expected by
xargs
: