Most commands can deal with input that's either a file that they need to open for input, or as a stream of data that's passed to the command via STDIN.
When the contents of cat file.txt
is sent to another command through a pipe (|
) the output via STDOUT that's passed to the pipe on the left side, is setup and fed to the command that's on the right side of the pipe's STDIN.
If the contents is not being passed via STDOUT -> STDIN via a pipe, then commands can receive data by opening files that are passed by name via command line arguments.
Examples
Sends output to STDOUT.
$ cat file
1
2
3
4
5
Output from cat file
is sent via STDOUT to grep
's STDIN via the pipe.
$ cat file | grep 5
5
Processing the file as a command line argument.
$ grep 5 file
5
Processing the contents of the file via STDIN directly.
$ grep 5 < <(cat file)
5
Here I'm demonstrating that the contents of file
can be directed to grep
via STDIN above.
Easiest way would be to pipe through some program which sets nonblocking output.
Here is simple perl oneliner (which you can save as leakybuffer) which does so:
so your a | b
becomes:
a | perl -MFcntl -e \
'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (<STDIN>) { print }' | b
what is does is read the input and write to output (same as cat(1)
) but the output is nonblocking - meaning that if write fails, it will return error and lose data, but the process will continue with next line of input as we conveniently ignore the error. Process is kind-of line-buffered as you wanted, but see caveat below.
you can test with for example:
seq 1 500000 | perl -w -MFcntl -e \
'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (<STDIN>) { print }' | \
while read a; do echo $a; done > output
you will get output
file with lost lines (exact output depends on the speed of your shell etc.) like this:
12768
12769
12770
12771
12772
12773
127775610
75611
75612
75613
you see where the shell lost lines after 12773
, but also an anomaly - the perl didn't have enough buffer for 12774\n
but did for 1277
so it wrote just that -- and so next number 75610
does not start at the beginning of the line, making it little ugly.
That could be improved upon by having perl detect when the write did not succeed completely, and then later try to flush remaining of the line while ignoring new lines coming in, but that would complicate perl script much more, so is left as an exercise for the interested reader :)
Update (for binary files):
If you are not processing newline terminated lines (like log files or similar), you need to change command slightly, or perl will consume large amounts of memory (depending how often newline characters appear in your input):
perl -w -MFcntl -e 'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (read STDIN, $_, 4096) { print }'
it will work correctly for binary files too (without consuming extra memory).
Update2 - nicer text file output:
Avoiding output buffers (syswrite
instead of print
):
seq 1 500000 | perl -w -MFcntl -e \
'fcntl STDOUT,F_SETFL,O_NONBLOCK; while (<STDIN>) { syswrite STDOUT,$_ }' | \
while read a; do echo $a; done > output
seems to fix problems with "merged lines" for me:
12766
12767
12768
16384
16385
16386
(Note: one can verify on which lines output was cut with: perl -ne '$c++; next if $c==$_; print "$c $_"; $c=$_' output
oneliner)
Best Answer
If the output of
inotifywait -q -m ./
is not redirected and you're running it in a terminal emulator, the output will go to a pty device. Apty
device is a form of interprocess communication, a bit like a pipe though with added features to facilitate terminal-like interactions.At the other end of that pty "pipe", your terminal emulator will read what
inotifywait
writes and render it on the screen. Doing that rendering is complicated and expensive in CPU time.If your terminal emulator is slower to empty that pipe than
inotifywait
is to fill it up, then the pty pipe will get full. When it is full, like for pipes, the writing process blocks (thewrite()
system calls doesn't return) until there's free space again in the "pipe".With my version of Linux, I find that I can write 19457 bytes to a pty device with nothing reading at the other end before it blocks if I write 1 byte at a time:
19458 bytes if I write 2 bytes at a time, 19712 if I write 256 bytes at a time, and different values if I put the terminal in raw mode or include newlines in the data I send (as they get transformed to CRLFs).
In any case, I don't think that buffer size is customizable.
inotifywait
uses the inotify API to retrieve that list of events. In theinotify(7)
man page, you'll find:When
inotifywait
is blocked on thewrite()
to standard output, it can't process the events put on that queue by the kernel. If that queue itself gets full, events are discarded.On my system,
Now, when you do:
This time, we have a pipe in between
inotifywait
andcat
and a pty betweencat
and your terminal emulator.pipes have a larger buffer than ptys (64KiB by default on Linux, though can be raised on a per-pipe basis up to
fs.pipe-max-size
sysctl value (1MiB by default) usingfcntl(fd, F_SETPIPE_SZ, newsize)
).So before
inotifywait
'swrite()
blocks, we need to fill up both those buffers. Plus,cat
will also have read some data in its own reading buffer, and waiting to write it itself.For each
| cat
you add, you add extra buffering space (at least 64KiB more).With
pv -q -B 1g
,pv
will buffer data internally.Those
cat
andpv
will be quicker at reading their input than your terminal emulator, because they need to do far less work to process it, but ifinotifywait
is not quick enough to read/decode/format events, some can still be dropped.To minimize the chance of events being dropped, you can:
fs.inotify.max_queued_events
inotifywait
output to slow consumers or add sufficient buffering if you doinotifywait
filters to only select events you're interested in.inotifywait
and the consumers of its output are not given a low priority (nonice
ing them).