I have the following two test files:
test1 test2
Both of them are blank. Now I issue the following commands:
$ cat > test1
Enter
This is a test file
Enter
Ctrl + D
$ cat > test2
Enter
This is another test file
Enter
^C
Ctrl + C
$
Now I check the contents of the two files
$ cat test1
This is a test file
$ cat test2
This is another test file
$
So is there any real difference in the outcome if we use the above two methods to achieve the same outcome?
Best Answer
When the
cat
command is running, the terminal is in canonical input mode. This means, in short, that the terminal's line discipline is handling line editing, and is responding to all of the special characters configured for the terminal (viewable and settable with thestty
command).The
cat
command is simplyread()
ing from its standard input until aread()
call returns zero bytes read, the POSIX convention for hitting end of file.Terminals do not really have an "end". But there is a circumstance where
read()
of a terminal device returns zero bytes. When the line discipline receives the "EOF" special character, whatever that happens to be configured as at the time, it causesread()
to return with whatever is in the editing buffer at that point. If the editing buffer was empty, that returns zero bytes read fromread()
, causingcat
to exit.cat
also exits in response to signals whose default actions are to terminate the process. The line discipline also generates signals in response to special characters. The "INTR" and "QUIT" special characters cause theINT
andQUIT
signals to be sent to the foreground process (group), which will be/contain thecat
process. The default action of these signals is to terminate thecat
process.Which leads to the observable differences:
cat
to terminate when the line is not in fact empty at the time. An interrupt generated by Ctrl+C will, though.cat
in the C language will block buffer standard output if it finds it directed at a file, as in the question. In theory, this could lead to buffered and not yet output lines being lost ifcat
is terminated bySIGINT
.In practice, the BSD and GNU C libraries implement a buffering mode that is not described in the C or C++ language standards. Standard output when redirected to file or pipe is smart buffered. It is block buffered; except that whenever the C library finds itself about to
read()
the beginning of a new line from any file descriptor that is open to a terminal device, it flushes standard output. (The BSD and GNU C libraries do not quite implement the same semantics and do more than this, strictly speaking, but this behaviour is a common subset.) Thus an interrupt signal will not cause lost buffered output whencat
is built on top of such a C library.cat
is part of a command pipeline, some other process could be buffering the data, downstream ofcat
before those data reach an output file. So again when the line discipline sendsSIGINT
, which (by default) terminates all of the processes in the pipeline, input data buffered and not yet written will be lost; whereas terminatingcat
normally with the "EOF" special character will cause the pipeline to terminate normally, with all of the data passing to the downstream process before it receives an EOF indication from itsread()
of its standard input.Note that this bears very little relationship to what happens when your interactive shell is reading a line of input from you. When your shell is waiting for input, the terminal is in non-canonical input mode, in which mode the line discipline does not do any special handling of special characters. How your shell treats Ctrl+D and Ctrl+C is entirely up to the input editing library that your shell uses (libedit, readline, or ZLE) and how that editing library has been configured (with key bindings and suchlike).
Further reading