Note that the problem is not with tail
but with head
here which reads from the pipe more than the first line it is meant to output (so there's nothing left for tail
to read).
And yes, it's POSIX conformant.
head
is required to leave the cursor within stdin just after the last line it has output when the input is seekable, but not otherwise.
http://pubs.opengroup.org/onlinepubs/9699919799/utilities/V3_chap01.html:
When a standard utility reads a seekable input file and terminates without an error before it reaches end-of-file, the utility shall ensure that
the file offset in the open file description is properly positioned just past the last byte processed by the utility. For files that are not
seekable, the state of the file offset in the open file description for that file is unspecified.
For head
to be able to do that for a non-seekable file would mean it would have to read one byte at a time which would be terribly inefficient¹. That's what the read
or line
utility do or GNU sed
with the -u
option.
So you can replace head -n 20
with gsed -u 20q
if you want that behaviour.
Though here, you'd rather want:
sed -e 1b -e '$b' -e d
instead. Here, only one tool invocation, so no problem with an internal buffer that can't be shared between two tool invocations. Note however that for large files, it's going to be less efficient as sed
reads the whole file, while for seekable files tail
would skip most of it by seeking near the end of the file.
See the related discussion about buffering at Why is using a shell loop to process text considered bad practice?.
Note that tail
must output the tail of the stream on stdin. While, as an optimisation and for seekable files, implementations may seek to the end of the file to get the trailing data from there, it is not allowed to seek back to a point that would be before the initial position at the time tail
was invoked (Busybox tail
used to have that bug).
So for instance in:
{ cat; tail -n 1; } < file
Even though tail
could seek back to the last line of file
, it does not. Its stdin is an empty stream as cat
left the cursor at the end of the file; it's not allowed to reclaim data from that stream by seeking further backward in the file.
(Text above crossed out pending clarification by the Open Group and considering that it's not done correctly by several implementations)
¹ The head
builtin of ksh93
(enabled if you put /opt/ast/bin
ahead of $PATH
), for sockets (a type of non-seekable files) instead peeks at the input (using recvfrom(..., MSG_PEEK)
) prior to actually reading it to see how much it needs to read to make sure it doesn't read too much. And falls back to reading one byte at a time for other types of files. That is slightly more efficient and I believe is the main reason why it implements its pipes with socketpair()
s instead of pipe()
. Note that it's not completely fool proof as there's a race condition that could be triggered if another process read from the socket in between the peek and the read.
You can use -path
to match a given depth and prune there. Eg
find . -path '*/*/*' -prune -o -type d -print
would be maxdepth 1, as *
matches the .
, */*
matches ./dir1
, and */*/*
matches ./dir1/dir2
which is pruned. If you use an absolute starting directory you need to add a leading /
to the -path
too.
Best Answer
I think primarily because:
the behaviour varies greatly between implementation. See https://www.in-ulm.de/~mascheck/various/shebang/ for all the details.
It could however now specify a minimum subset of most Unix-like implementations: like
#! *[^ ]+( +[^ ]+)?\n
(with only characters from the portable filename character set in those one or two words) where the first word is an absolute path to a native executable, the thing is not too long and behaviour unspecified if the executable is setuid/setgid, and implementation defined whether the interpreter path or the script path is passed asargv[0]
to the interpreter.POSIX doesn't specify the path of executables anyway. Several systems have pre-POSIX utilities in
/bin
//usr/bin
and have the POSIX utilities somewhere else (like on Solaris 10 where/bin/sh
is a Bourne shell and the POSIX one is in/usr/xpg4/bin
; Solaris 11 replaced it with ksh93 which is more POSIX compliant, but most of the other tools in/bin
are still ancient non-POSIX ones). Some systems are not POSIX ones but have a POSIX mode/emulation. All POSIX requires is that there be a documented environment in which a system behaves POSIXly.See Windows+Cygwin for instance. Actually, with Windows+Cygwin, the she-bang is honoured when a script is invoked by a cygwin application, but not by a native Windows application.
So even if POSIX specified the shebang mechanism it could not be used to write POSIX
sh
/sed
/awk
... scripts (also note that the shebang mechanism cannot be used to write reliablesed
/awk
script as it doesn't allow passing an end-of-option marker).Now the fact that it's unspecified doesn't mean you can't use it (well, it says you shouldn't have the first line start with
#!
if you expect it to be only a regular comment and not a she-bang), but that POSIX gives you no guarantee if you do.In my experience, using shebangs gives you more guarantee of portability than using POSIX's way of writing shell scripts: leave off the she-bang, write the script in POSIX
sh
syntax and hope that whatever invokes the script invokes a POSIX compliantsh
on it, which is fine if you know the script will be invoked in the right environment by the right tool but not otherwise.You may have to do things like:
If you want to be portable to Windows+Cygwin, you may have to name your file with a
.bat
or.ps1
extension and use some similar trick forcmd.exe
orpowershell.exe
to invoke the cygwinsh
on the same file.