Linux Stdin – Line-Buffered Option with Stdbuf

bufferlinuxstdin

The man page for the stdbuf command states that line-buffered mode is invalid as a standard input buffering option.
What is the reason for this?

tail -f access.log | stdbuf -iL cut -d' ' -f1 | uniq

stdbuf: line buffering stdin is meaningless
Try `stdbuf --help' for more information.

Best Answer

Simplified, stdbuf is a wrapper around stdio functionality. Line buffering of input streams is undefined in stdio; I can find no standards document that says what it means, so it is literally meaningless as far as the standards go.

Assuming behavior analogous to stdout line buffering, the line buffering of stdin would require calling read() once for each character read, because there is no other way to guarantee that you don't read past a newline on a descriptor. Since the point of buffering is to reduce the number of system calls, it is unsurprising that the stdio library doesn't implement this.

Related Solutions

Linux – ‘user’ Option Works for Mount but Not for Umount

The problem is that your /etc/mtab is not a file but a symlink to /proc/mounts. This has advantages but also the disadvantage that user does not work. You already guessed right the reason for that: "the system is confused when remembering who mounted the file system". This information is written to mtab, cannot be written there in your case though. The kernel doesn't care (doesn't even know) about user mounts (this is a userspace feature). Thus this info is not contained in /proc/mounts.

Do this:

cd /etc
cp mtab mtab.file
rm mtab
mv mtab.file mtab

umount as user should work after you have mounted the volume again.

Uniq and bash for loop not writing to stdout before stdin closing (for one-line website visitor notification system)

I think I understand what you are trying to accomplish:

For each hit to the web site, which is logged by the web server:
If the visit is "unique" (how do you define this??) log the entry and send an audible notification.

The trick is how you define "unique". Is it by URL, by IP address, by cookie? Your approach with awk was arguably the right way to go, but you got snagged by shell-escaping rules.

So here is something that sort of combines your approaches. First, you really need a script on the web server to do this. Otherwise you're going to be lost in complex quotation-escaping rules. Second, I'm assuming your web-server is using the "common-log format", which frankly, sucks for this kind of work, but we can work with it.

while true; do 
  ssh root@speedy remote-log-capturing-script
done > unique-visits.log

Use mikeserv's excellent suggestion about MAILFILE. The script on speedy should look like this:

#!/bin/sh
tail -1f /var/log/apache2/www.access.log | 
awk '$(NF-1) == 200' | 
grep --line-buffered -o '"GET [^"]*"' |
awk '!url[$1]{ print; url[$1]=1 }'

Awk is always line-buffered. The first awk ensures you're only getting actual successful hits, not cached-hits or 404s. The grep -o prints out only the matching part of the input, in this case, the URL. (This is GNU grep, which I assume you are using. If not, use the stdbuf trick.) The next awk uses a little expression to conditionally print out the input line -- only if that input line was never before seen.

You can also do this with perl to achieve more complexity within one fork:

#!/bin/sh
tail -1f /var/log/apache2/www.access.log | 
perl -lane '$|=1;' \
  -e 'if ($F[$#F-1] eq "200" and ' \
  -e ' /\s"GET\s([^"]*)"\s/ and !$url{$1}) { '\
  -e '  print $1;$url{$1}=undef; }'

Now both of these will only print unique URLs. What if two web clients from different IPs hit the same page? You only get one output. To change that, with the perl solutions, this is easy: modify the key that goes into url.

 $url{$F[0],$1}

When using perl -a, $F[0] represents the first white-space-delimited field of input, just like awk's $1 -- ie, the connecting hostname/IP address. And perl's $1 represents the first matching subexpression of the regular-expression /\s"GET\s([^"]*)"\s/, ie, just the URL itself. The cryptic $F[$#F-1] means 2nd-to-last field of the input line.

Best Answer

Related Solutions

Linux – ‘user’ Option Works for Mount but Not for Umount

Uniq and bash for loop not writing to stdout before stdin closing (for one-line website visitor notification system)

Related Question