The second example:
find . -name '*.txt' -print0 | xargs -0 cat > out.txt
Is completely legal and will recreate the file, out.txt
each time it's run, while the first will concatenate to out.txt
if it runs. But both commands are doing essentially the same thing.
What's confusing the issue is the xargs -0 cat
. People think that the redirect to out.txt
is part of that command when it isn't. The redirect is happening after xargs -o cat
has taken input in via STDIN, and then cat'ing that output as a single stream out to STDOUT. The xargs
is optimizing the cat'ing of the files not their output.
Here's an example that kind of shows what I'm saying. If we insert a pv -l
in between the xargs -0 cat
and the output to the file out.txt
we can see how many lines cat has written.
Example
To show this I created a directory with 10,000 files in it.
for i in `seq -w 1 10000`;do echo "contents of file$i.txt" > file$i.txt;done
Each file looks similar to this:
$ more file00001.txt
contents of file00001.txt
The output from pv
:
$ find . -name '*.txt' -print0 | xargs -0 cat | pv -l > singlefile.rpt
10k 0:00:00 [31.1k/s] [ <=>
As we can see, 10k lines were written out to my singlefile.rpt
file. If xargs
were passing us chunks of output, then we'd see that by a reduction in the number of lines that were being presented to pv
.
From http://www.manpagez.com/man/1/ksh/:
<>word Open file word for reading and writing as standard out-
put.
<&digit The standard input is duplicated from file descriptor
digit (see dup(2)). Similarly for the standard output
using >&digit.
<&- The standard input is closed. Similarly for the standard
output using >&-.
You will find all those details by typing man ksh
.
Especially 2>&-
means: close the standard error stream, i.e. the command is no longer able to write to STDERR, which will break the standard which requires it to be writable.
To understand the concept of file descriptors, (if on a Linux system) you may have a look at /proc/*/fd
(and/or /dev/fd/*
):
$ ls -l /proc/self/fd
insgesamt 0
lrwx------ 1 michas users 1 18. Jan 16:52 0 -> /dev/pts/0
lrwx------ 1 michas users 1 18. Jan 16:52 1 -> /dev/pts/0
lrwx------ 1 michas users 1 18. Jan 16:52 2 -> /dev/pts/0
lr-x------ 1 michas users 1 18. Jan 16:52 3 -> /proc/2903/fd
File descriptor 0 (aka STDIN) is used per default for reading, fd 1 (aka STDOUT) is default for writing, and fd 2 (aka STDERR) is default for error messages. (fd 3 is in this case used by ls
to actually read that directory.)
If you redirect stuff it might look like this:
$ ls -l /proc/self/fd 2>/dev/null </dev/zero 99<>/dev/random |cat
insgesamt 0
lr-x------ 1 michas users 1 18. Jan 16:57 0 -> /dev/zero
l-wx------ 1 michas users 1 18. Jan 16:57 1 -> pipe:[28468]
l-wx------ 1 michas users 1 18. Jan 16:57 2 -> /dev/null
lr-x------ 1 michas users 1 18. Jan 16:57 3 -> /proc/3000/fd
lrwx------ 1 michas users 1 18. Jan 16:57 99 -> /dev/random
Now the default descriptors do no longer point to your terminal but to the corresponding redirects. (As you see, you can also create new fds.)
One more example for <>
:
echo -e 'line 1\nline 2\nline 3' > foo # create a new file with three lines
( # with that file redirected to fd 5
read <&5 # read the first line
echo "xxxxxx">&5 # override the second line
cat <&5 # output the remaining line
) 5<>foo # this is the actual redirection
You can do such things, but you very seldom have to do so.
Best Answer
Looking at the two commands separately:
Here, since redirections are processed in a left-to-right manner, the standard error stream would first be redirected to wherever the standard output stream goes (possibly to the console), and then the standard output stream would be redirected to a file. The standard error stream would not be redirected to that file.
The visible effect of this would be that you get what's produced on standard error on the screen and what's produced on standard output in the file.
Here, you redirect standard error to the same place as the standard output stream. This means that both streams will be piped to the
tee
utility as a single intermingled output stream, and that this standard output data will be saved to the given file bytee
. The data would additionally be reproduced bytee
in the console (this is whattee
does, it duplicates data streams).Which ever one of these is used depends on what you'd like to achieve.
Note that you would not be able to reproduce the effect of the second pipeline with just
>
(as inutility >output.log 2>&1
, which would save both standard output and error in the file by first redirecting standard output to theoutput.log
file and then redirecting standard error to where standard output is now going). You would need to usetee
to get the data in the console as well as in the output file.Additional notes:
The visible effect of the first command,
would be the same as
I.e., the standard output goes to the file and standard error goes to the console.
If a further processing step was added to the end of each of the above commands, there would be a big difference though:
In the first pipeline,
more_stuff
would get what's originally the standard error stream fromutility
as its standard input data, while in the second pipeline, since it's only the resulting standard output stream that is ever sent across a pipe, themore_stuff
part of the pipeline would get nothing to read on its standard input.