I created a test file named 'test' that contains the following:
xxx
yyy
zzz
I ran the command:
(sed '/y/ q'; echo aaa; cat) < test
and I got:
xxx
yyy
aaa
zzz
Then I ran:
cat test | (sed '/y/ q'; echo aaa; cat)
and got:
xxx
yyy
aaa
Question
sed
reads and prints until it encounters a line with 'y', then stops. In the first case, but not the second, cat reads and prints the rest.
Can someone explain what phenomenon is behind this difference in behavior?
I also noticed it works this way in Ubuntu 16.04 and Centos 6 but in Centos 7 neither command prints 'zzz'.
Best Answer
When input file is seekable (like reading from regular file) or un-seekable (like reading from a pipe),
sed
(and other standard utilities) will behave differently (ReadINPUT FILES
section in this link).Quote from the doc:
So in:
sed
performedq
uit command before reaching EOF, so it left file offset at beginning ofzzz
line, socat
can continue printing the remain lines (GNU sed is not POSIX compliant in some condition, see below).And continuing from the doc:
In this case, the behavior is unspecified. Most standard tools, include
sed
will consume the input as much as possible . It read pass theyyy
line, andq
uit without restoring the file offset, so nothing is left forcat
.GNU
sed
is not compliant to the standard, depends on system's stdio implementation and glibc version:Here, the result was got from Mac OSX 10.11.6, virtual machines Centos 7.2 - glibc 2.17, Ubuntu 14.04 - glibc 2.19, which are run on Openstack with CEPH backend.
On those systems, you can use
-u
option to achieve the standard behavior:and for pipe:
which leads to terribly inefficient performance, because
sed
has to read one byte at a time. A partial output fromstrace
: