Below are a dozen or so examples of how you can take a file such as this:
$ cat k.txt
1
2
3
and convert it to this format:
1,2,3
You can use this command to create the above file if you'd like to play along:
$ cat <<EOF > k.txt
1
2
3
EOF
The examples below are split into 2 groups. Ones that "work" and ones that "almost" work. I leave these because often times it's just as valuable to see why something doesn't work, as it is to see why something does.
Most scripting languages that I'm familiar with are represented. Some are represented multiple times, since as with the famous acronym typically referenced in Perl, TIMTOWTDI.
NOTE: You can swap out the comma (,
) in the examples below and replace it with whatever characters you want, i.e. |
.
Examples that "work"
These code snippets will produce the desired output.
The paste
command:
$ paste -s -d ',' k.txt
1,2,3
The sed
command:
$ sed ':a;N;$!ba;s/\n/,/g' k.txt
1,2,3
$ sed ':a;{N;s/\n/,/};ba' k.txt
1,2,3
The perl
command:
$ perl -00 -p -e 's/\n(?!$)/,/g' k.txt
1,2,3
$ perl -00 -p -e 'chomp;tr/\n/,/' k.txt
1,2,3
The awk
command:
$ awk '{printf"%s%s",c,$0;c=","}' k.txt
1,2,3
$ awk '{printf "%s,",$0}' k.txt | awk '{sub(/\,$/,"");print}'
1,2,3
$ awk -vORS=, 1 k.txt | awk '{sub(/\,$/,"");print}'
1,2,3
$ awk 'BEGIN {RS="dn"}{gsub("\n",",");print $0}' k.txt | awk '{sub(/\,$/,"");print}'
1,2,3
The python
command:
$ python -c "import sys; print sys.stdin.read().replace('\n', ',')[0:-1]" <k.txt
1,2,3
$ python -c "import sys; print sys.stdin.read().replace('\n', ',').rstrip(',')" <k.txt
1,2,3
Bash's mapfile
built-in:
$ mapfile -t a < k.txt; (IFS=','; echo "${a[*]}")
1,2,3
The ruby
command:
$ ruby -00 -pe 'gsub /\n/,",";chop' < k.txt
1,2,3
$ ruby -00 -pe '$_.chomp!"\n";$_.tr!"\n",","' k.txt
1,2,3
The php
command:
$ php -r 'echo strtr(chop(file_get_contents($argv[1])),"\n",",");' k.txt
1,2,3
Caveats
Most of the examples above will work just fine. Some have hidden issues, such as the PHP example above. The function chop()
is actually an alias to rtrim()
, so the last line's trailing spaces will also be removed.
So too do does the first Ruby example, and the first Python example. The issue is with how they're all making use of a type of operation that essentially "chops" off, blindly, a trailing character. This is fine in for the example that the OP provided, but care must be taken when using these types of one liners to make sure that they conform with the data they're processing.
Example
Say our sample file, k.txt
looked like this instead:
$ echo -en "1\n2\n3" > k.txt
It looks similar but it has one slight difference. It doesn't have a trailing newline (\n
) like the original file. Now when we run the first Python example we get this:
$ python -c "import sys; print sys.stdin.read().replace('\n', ',')[0:-1]" <k.txt
1,2,
Examples that "almost" work
These are the "always a bridesmaid, never a bride" examples. Most of them could probably be adapted, but when working a potential solution to a problem, when it feels "forced", it's probably the wrong tool for the job!
The perl
command:
$ perl -p -e 's/\n/,/' k.txt
1,2,3,
The tr
command:
$ tr '\n' ',' < k.txt
1,2,3,
The cat
+ echo
commands:
$ echo $(cat k.txt)
1 2 3
The ruby
command:
$ ruby -pe '$_["\n"]=","' k.txt
1,2,3,
Bash's while
+ read
built-ins:
$ while read line; do echo -n "$line,"; done < k.txt
1,2,3,
The POSIX specification does give you an example for that:
ls | sed -e 's/"/"\\""/g' -e 's/.*/"&"/' | xargs -E '' printf '<%s>\n'
(with filenames being arbitrary sequences of bytes (other than /
and NULL) and sed
/xargs
expecting text, you'd also need to fix the locale to C (where all non-NUL bytes would make valid characters) to make that reliable (except for xargs
implementations that have a very low limit on the maximum length of an argument))
The -E ''
is needed for some xargs
implementations that without it, would understand a _
argument to signify the end of input (where echo a _ b | xargs
outputs a
only for instance).
With GNU xargs
, you can use:
ls | xargs -d '\n' printf '<%s>\n'
GNU xargs
has also a -0
that has been copied by a few other implementations, so:
ls | tr '\n' '\0' | xargs -0 printf '<%s>\n'
is slightly more portable.
All of those assume the file names don't contain newline characters. If there may be filenames with newline characters, the output of ls
is simply not post-processable. If you get:
a
b
That can be either two a
and b
files or a file called a<newline>b
, there's no way to tell.
GNU ls
has a --quoting-style=shell-always
which makes its output unambiguous and could be post-processable, but the quoting is not compatible with the quoting expected by xargs
. xargs
recognise "..."
, \x
and '...'
forms of quoting. But both "..."
and '...'
are strong quotes and can't contain newline characters (only \
can escape newline characters for xargs
), so that's not compatible with sh quoting where only '...'
are strong quotes (and can contain newline characters) but \<newline>
is a line-continuation (is removed) instead of an escaped newline.
You can use the shell to parse that output and then output it in a format expected by xargs
:
eval "files=($(ls --quoting-style=shell-always))"
[ "${#files[#]}" -eq 0 ] || printf '%s\0' "${files[@]}" |
xargs -0 printf '<%s>\n'
Best Answer
This is to do with writes to pipes. With
-L16
you are running one process for each 16 files, which produces about a thousand characters, depending on how long the filenames are. With-L64
you are about four thousand. Thels
program almost certainly uses the stdio library, and almost certainly uses a 4kB buffer for outputting to reduce the number of write calls.So find produces a load of filenames, then (for the -L64 case) xargs chops them into bundles of 64 and starts up 4
ls
processes to handle them. Eachls
will generate its first 4k of output and write it to the pipe to sort. Note that this 4k will typically not end with a newline. So say the thirdls
gets its first 4kB ready first, and it endsand then the first ls outputs something, e.g.
then the input to sort will include
lrwxrwxrwx 1 root root 6 Oct 2total 123459
In the
-L16
case, thels
processes will (usually) only output a complete set of results in one go.Of course for this case you are just wasting time and resources by using xargs and ls, you should just let
find
output the information it already has rather than running extra programs to discover the information again.