I think I understand what you are trying to accomplish:
- For each hit to the web site, which is logged by the web server:
- If the visit is "unique" (how do you define this??) log the entry and send an audible notification.
The trick is how you define "unique". Is it by URL, by IP address, by cookie?
Your approach with awk was arguably the right way to go, but you got snagged by shell-escaping rules.
So here is something that sort of combines your approaches. First, you really need a script on the web server to do this. Otherwise you're going to be lost in complex quotation-escaping rules. Second, I'm assuming your web-server is using the "common-log format", which frankly, sucks for this kind of work, but we can work with it.
while true; do
ssh root@speedy remote-log-capturing-script
done > unique-visits.log
Use mikeserv's excellent suggestion about MAILFILE. The script on speedy should look like this:
#!/bin/sh
tail -1f /var/log/apache2/www.access.log |
awk '$(NF-1) == 200' |
grep --line-buffered -o '"GET [^"]*"' |
awk '!url[$1]{ print; url[$1]=1 }'
Awk is always line-buffered. The first awk ensures you're only getting actual successful hits, not cached-hits or 404s. The grep -o prints out only the matching part of the input, in this case, the URL. (This is GNU grep, which I assume you are using. If not, use the stdbuf trick.) The next awk uses a little expression to conditionally print out the input line -- only if that input line was never before seen.
You can also do this with perl to achieve more complexity within one fork:
#!/bin/sh
tail -1f /var/log/apache2/www.access.log |
perl -lane '$|=1;' \
-e 'if ($F[$#F-1] eq "200" and ' \
-e ' /\s"GET\s([^"]*)"\s/ and !$url{$1}) { '\
-e ' print $1;$url{$1}=undef; }'
Now both of these will only print unique URLs. What if two web clients from different IPs hit the same page? You only get one output. To change that, with the perl solutions, this is easy: modify the key that goes into url.
$url{$F[0],$1}
When using perl -a, $F[0] represents the first white-space-delimited field of input, just like awk's $1 -- ie, the connecting hostname/IP address. And perl's $1 represents the first matching subexpression of the regular-expression /\s"GET\s([^"]*)"\s/
, ie, just the URL itself. The cryptic $F[$#F-1]
means 2nd-to-last field of the input line.
I believe your issue boils down to improperly quoting your expansions.
Quoting from zsh:14 Expansion
A command enclosed in parentheses preceded by a dollar sign, like
$(...)
, or quoted with grave accents, like ‘...
’, is replaced with
its standard output, with any trailing newlines deleted. If the
substitution is not enclosed in double quotes, the output is broken
into words using the IFS parameter. The substitution $(cat foo)
may
be replaced by the equivalent but faster $(<foo)
. In either case, if
the option GLOB_SUBST is set, the output is eligible for filename
generation.
Note that Example #2 in your question results in an infinite echo of NULL, due to:
If the substitution is not enclosed in double quotes, the output is
broken into words using the IFS parameter.
In other words the shell infinitely waits for the echo
, because the default delimiter is SPACE, the echo never completes. See TLDP: Internal Variables. This leaves a hung pipe for the cat
command.
As a hunch, I believe 4 and 5 work due to output redirection.
Best Answer
stdin
is file descriptor0
. Closing a file descriptor of a process is something that can only be done actively by the process itself. stdin is closed when the process decides to close it period.Now, when the stdin of a process is the reading end of a pipe, the other end of the pipe can be open by one or more other processes. When all the file descriptors to the other end have been closed, reading from that pipe will read the remaining data still in that pipe, but will then end up returning nothing (instead of waiting for more data) meaning end-of-file.
Applications like
cat
,cut
,wc
... that read from their stdin will usually exit when that happens because their role is to process their input till the end until there's no more input.There's no magic mechanism that causes applications to die when the end of their input is reached, only them deciding to exit when that happens.
In:
Once
echo
has written"foo\n"
, it exits which causes the writing end of the pipe to be closed, then theread()
done bycat
at the other end returns 0 bytes, which tellscat
there's nothing more to read and thencat
decides to exit.In
sleep
only exits after 1 second has elapsed. Its stdin becoming a closed pipe has no incidence on that,sleep
is not even reading from its stdin.It's different on the writing end of pipes (or sockets for that matters).
When all the fds on the reading end have been closed, any attempt to write on the fds open to the writing end causes a SIGPIPE to be sent to the process causing it to die (unless it ignores the signal in which case the
write()
fails withEPIPE
).But that only happens when they try to write.
For instance, in:
Even though
true
exits straight away and the reading end is then closed straight away,sleep
is not killed because it doesn't attempt to write to its stdout.Now, about
/proc/fd/pid/n
showing in red in thels -l --color
output (as mentioned in the first version of your question), that's only becausels
does alstat()
on the result ofreadlink()
on that symlink to try and determine the type of the target of the link.For file descriptors opened on pipes, or sockets or files in other namespaces, or deleted files, the result of
readlink
will not be an actual path on the file system, so the secondlstat()
done byls
will fail andls
will think it's a broken symlink, and broken symlinks are rendered in red. You'll get that with any fd to any end of any pipe, whether the other end of the pipe is closed or not. Try withls --color=always -l /proc/self/fd | cat
for instance.To determine whether a fd points to a broken pipe, on Linux, you can try
lsof
with the-E
option.For the fd 3, lsof was not able to find any other process at the reading end of the pipe. Beware though, that you could get output like:
fds 3 and 5 are still to broken pipes, because there's no fd to the reading end (there seems to be a bug in lsof, since the fact that
sleep
also has its fd 3 open to the broken pipe is not reflected everywhere).To kill a process as soon as the pipe open on its stdin loses its last writer (becomes broken), you could do something like:
Which would watch for an error condition on stdin (and on Linux, that seems to happen as soon as there's no writer left, even if there's data left in the pipe) and kill the child command as soon as it happens. For instance:
Would kill the
sleep 2
process after 1 second.Now, generally that's a bit of a silly thing to do. That means you're potentially killing a command before it has had time to process the end of its input. For instance, in:
You'll find that
cat
is sometimes killed before it has had time to output (or even to read!)"test\n"
. There's no way around that, our watcher can't know how much time the command needs to process the input. All we can do is give a grace period before thekill "TERM"
hoping it's enough for the command to read the content left in the pipe and do what it needs to do with it.