Shell – Why does cat x >> x loop

catfilesio-redirectionshell

The following bash commands go into an infinte loop:

$ echo hi > x
$ cat x >> x

I can guess that cat continues to read from x after
it has started writing to stdout. What is confusing, however,
is that my own test implementation of cat exhibits different behavior:

// mycat.c
#include <stdio.h>

int main(int argc, char **argv) {
  FILE *f = fopen(argv[1], "rb");
  char buf[4096];
  int num_read;
  while ((num_read = fread(buf, 1, 4096, f))) {
    fwrite(buf, 1, num_read, stdout);
    fflush(stdout);
  }

  return 0;
}

If I run:

$ make mycat
$ echo hi > x
$ ./mycat x >> x

It does not loop. Given the behavior of cat and the fact that I'm
flushing to stdout before fread is called again, I would expect this C code to continue reading and writing in a cycle.

How are these two behaviors consistent? What mechanism explains why cat loops while the above code does not?

Best Answer

On an older RHEL system I've got, /bin/cat does not loop for cat x >> x. cat gives the error message "cat: x: input file is output file". I can fool /bin/cat by doing this: cat < x >> x. When I try your code above, I get the "looping" you describe. I also wrote a system call based "cat":

#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
int
main(int ac, char **av)
{
        char buf[4906];
        int fd, cc;
        fd = open(av[1], O_RDONLY);
        while ((cc = read(fd, buf, sizeof(buf))) > 0)
                if (cc > 0) write(1, buf, cc);
        close(fd);
        return 0;
}

This loops, too. The only buffering here (unlike for stdio-based "mycat") is what goes on in the kernel.

I think what's happening is that file descriptor 3 (the result of open(av[1])) has an offset into the file of 0. Filed descriptor 1 (stdout) has an offset of 3, because the ">>" causes the invoking shell to do an lseek() on the file descriptor before handing it off to the cat child process.

Doing a read() of any sort, whether into a stdio buffer, or a plain char buf[] advances the position of file descriptor 3. Doing a write() advances the position of file descriptor 1. Those two offsets are different numbers. Because of the ">>", file descriptor 1 always has an offset greater than or equal to the offset of file descriptor 3. So any "cat-like" program will loop, unless it does some internal buffering. It's possible, maybe even likely, that a stdio implementation of a FILE * (which is the type of the symbols stdout and f in your code) that includes its own buffer. fread() may actually do a system call read() to fill the internal buffer fo f. This may or may not change anything in the insides of stdout. Calling fwrite() on stdout may or may not change anything inside of f. So a stdio-based "cat" might not loop. Or it might. Hard to say without reading through a lot of ugly, ugly libc code.

I did an strace on the RHEL cat - it just does a succession of read() and write() system calls. But a cat doesn't have to work this way. It would be possible to mmap() the input file, then do write(1, mapped_address, input_file_size). The kernel would do all the work. Or you could do a sendfile() system call between the input and output file descriptors on Linux systems. Old SunOS 4.x systems were rumored to do the memory mapping trick, but I don't know if any one has ever done a sendfile-based cat. In either case the "looping" wouldn't happen, as both write() and sendfile() require a length-to-transfer parameter.

Example

These 2 examples do essentially the same thing but get their input in 2 slightly different manners.

opens file

$ cat blah.txt 
hi

opens STDIN

$ cat < blah.txt 
hi

Peeking behind the curtain

You can use strace to see what's going on.

When we read from a file

open("blah.txt", O_RDONLY)              = 3
fstat(3, {st_mode=S_IFREG|0664, st_size=3, ...}) = 0
fadvise64(3, 0, 0, POSIX_FADV_SEQUENTIAL) = 0
read(3, "hi\n", 65536)                  = 3
write(1, "hi\n", 3hi
)                     = 3
read(3, "", 65536)                      = 0
close(3)                                = 0
close(1)                                = 0

When we read from STDIN (identified as 0)

read(0, "hi\n", 65536)                  = 3
write(1, "hi\n", 3hi
)                     = 3
read(0, "", 65536)                      = 0
close(0)                                = 0
close(1)                                = 0

In the first example we can see that cat opened the file and read from it, blah.txt. In the second we can see that cat reads the contents of the file blah.txt via the STDIN file descriptor, identified as descriptor number 0.

read(0, "hi\n", 65536)                  = 3

Why does “cat ttyUSB0” not produce output

I think for serial devices you have to set the baud rate before they do anything. I'm not sure how to do that from the command line in order to get cat to work, but you could use a terminal emulator which takes care of it.

Try minicom or screen (i.e. screen /dev/ttyUSB0 115200 - replace 115200 with the baud rate of your IR receiver.)

Best Answer

Related Solutions

Bash – Why the Less-Than Sign Doesn’t Work as a Replacement for cat

Example

Peeking behind the curtain

Why does “cat ttyUSB0” not produce output

Related Question