The following bash commands go into an infinte loop:
$ echo hi > x
$ cat x >> x
I can guess that cat
continues to read from x
after
it has started writing to stdout. What is confusing, however,
is that my own test implementation of cat exhibits different behavior:
// mycat.c
#include <stdio.h>
int main(int argc, char **argv) {
FILE *f = fopen(argv[1], "rb");
char buf[4096];
int num_read;
while ((num_read = fread(buf, 1, 4096, f))) {
fwrite(buf, 1, num_read, stdout);
fflush(stdout);
}
return 0;
}
If I run:
$ make mycat
$ echo hi > x
$ ./mycat x >> x
It does not loop. Given the behavior of cat
and the fact that I'm
flushing to stdout
before fread
is called again, I would expect this C code to continue reading and writing in a cycle.
How are these two behaviors consistent? What mechanism explains why cat
loops while the above code does not?
Best Answer
On an older RHEL system I've got,
/bin/cat
does not loop forcat x >> x
.cat
gives the error message "cat: x: input file is output file". I can fool/bin/cat
by doing this:cat < x >> x
. When I try your code above, I get the "looping" you describe. I also wrote a system call based "cat":This loops, too. The only buffering here (unlike for stdio-based "mycat") is what goes on in the kernel.
I think what's happening is that file descriptor 3 (the result of
open(av[1])
) has an offset into the file of 0. Filed descriptor 1 (stdout) has an offset of 3, because the ">>" causes the invoking shell to do anlseek()
on the file descriptor before handing it off to thecat
child process.Doing a
read()
of any sort, whether into a stdio buffer, or a plainchar buf[]
advances the position of file descriptor 3. Doing awrite()
advances the position of file descriptor 1. Those two offsets are different numbers. Because of the ">>", file descriptor 1 always has an offset greater than or equal to the offset of file descriptor 3. So any "cat-like" program will loop, unless it does some internal buffering. It's possible, maybe even likely, that a stdio implementation of aFILE *
(which is the type of the symbolsstdout
andf
in your code) that includes its own buffer.fread()
may actually do a system callread()
to fill the internal buffer fof
. This may or may not change anything in the insides ofstdout
. Callingfwrite()
onstdout
may or may not change anything inside off
. So a stdio-based "cat" might not loop. Or it might. Hard to say without reading through a lot of ugly, ugly libc code.I did an
strace
on the RHELcat
- it just does a succession ofread()
andwrite()
system calls. But acat
doesn't have to work this way. It would be possible tommap()
the input file, then dowrite(1, mapped_address, input_file_size)
. The kernel would do all the work. Or you could do asendfile()
system call between the input and output file descriptors on Linux systems. Old SunOS 4.x systems were rumored to do the memory mapping trick, but I don't know if any one has ever done a sendfile-based cat. In either case the "looping" wouldn't happen, as bothwrite()
andsendfile()
require a length-to-transfer parameter.