I am piping output from one program into some Perl I wrote. This is a long running process , sometimes days, so I want to find out where my bottlenecks are and try to open them up. I want to know if data is being piped into my script faster than my script can process it. If this is the case I will try to tune up my script, but not if I don't need to. I see talk about a flag being set when the buffer is full that prevents any more writes to it but how do I check to see if or how often that flag is set? Any ideas?
How to tell if the pipe buffer is full
pipe
Related Solutions
The capacity of a pipe buffer varies across systems (and can even vary on the same system). I am not sure there is a quick, easy, and cross platform way to just lookup the capacity of a pipe.
Mac OS X, for example, uses a capacity of 16384 bytes by default, but can switch to 65336 byte capacities if large write are made to the pipe, or will switch to a capacity of a single system page if too much kernel memory is already being used by pipe buffers (see xnu/bsd/sys/pipe.h
, and xnu/bsd/kern/sys_pipe.c
; since these are from FreeBSD, the same behavior may happen there, too).
One Linux pipe(7) man page says that pipe capacity is 65536 bytes since Linux 2.6.11 and a single system page prior to that (e.g. 4096 bytes on (32-bit) x86 systems). The code (include/linux/pipe_fs_i.h
, and fs/pipe.c
) seems to use 16 system pages (i.e. 64 KiB if a system page is 4 KiB), but the buffer for each pipe can be adjusted via a fcntl on the pipe (up to a maximum capacity which defaults to 1048576 bytes, but can be changed via /proc/sys/fs/pipe-max-size
)).
Here is a little bash/perl combination that I used to test the pipe capacity on my system:
#!/bin/bash
test $# -ge 1 || { echo "usage: $0 write-size [wait-time]"; exit 1; }
test $# -ge 2 || set -- "$@" 1
bytes_written=$(
{
exec 3>&1
{
perl -e '
$size = $ARGV[0];
$block = q(a) x $size;
$num_written = 0;
sub report { print STDERR $num_written * $size, qq(\n); }
report; while (defined syswrite STDOUT, $block) {
$num_written++; report;
}
' "$1" 2>&3
} | (sleep "$2"; exec 0<&-);
} | tail -1
)
printf "write size: %10d; bytes successfully before error: %d\n" \
"$1" "$bytes_written"
Here is what I found running it with various write sizes on a Mac OS X 10.6.7 system (note the change for writes larger than 16KiB):
% /bin/bash -c 'for p in {0..18}; do /tmp/ts.sh $((2 ** $p)) 0.5; done'
write size: 1; bytes successfully before error: 16384
write size: 2; bytes successfully before error: 16384
write size: 4; bytes successfully before error: 16384
write size: 8; bytes successfully before error: 16384
write size: 16; bytes successfully before error: 16384
write size: 32; bytes successfully before error: 16384
write size: 64; bytes successfully before error: 16384
write size: 128; bytes successfully before error: 16384
write size: 256; bytes successfully before error: 16384
write size: 512; bytes successfully before error: 16384
write size: 1024; bytes successfully before error: 16384
write size: 2048; bytes successfully before error: 16384
write size: 4096; bytes successfully before error: 16384
write size: 8192; bytes successfully before error: 16384
write size: 16384; bytes successfully before error: 16384
write size: 32768; bytes successfully before error: 65536
write size: 65536; bytes successfully before error: 65536
write size: 131072; bytes successfully before error: 0
write size: 262144; bytes successfully before error: 0
The same script on Linux 3.19:
/bin/bash -c 'for p in {0..18}; do /tmp/ts.sh $((2 ** $p)) 0.5; done'
write size: 1; bytes successfully before error: 65536
write size: 2; bytes successfully before error: 65536
write size: 4; bytes successfully before error: 65536
write size: 8; bytes successfully before error: 65536
write size: 16; bytes successfully before error: 65536
write size: 32; bytes successfully before error: 65536
write size: 64; bytes successfully before error: 65536
write size: 128; bytes successfully before error: 65536
write size: 256; bytes successfully before error: 65536
write size: 512; bytes successfully before error: 65536
write size: 1024; bytes successfully before error: 65536
write size: 2048; bytes successfully before error: 65536
write size: 4096; bytes successfully before error: 65536
write size: 8192; bytes successfully before error: 65536
write size: 16384; bytes successfully before error: 65536
write size: 32768; bytes successfully before error: 65536
write size: 65536; bytes successfully before error: 65536
write size: 131072; bytes successfully before error: 0
write size: 262144; bytes successfully before error: 0
Note: The PIPE_BUF
value defined in the C header files (and the pathconf value for _PC_PIPE_BUF
), does not specify the capacity of pipes, but the maximum number of bytes that can be written atomically (see POSIX write(2)).
Quote from include/linux/pipe_fs_i.h
:
/* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
memory allocation, whereas PIPE_BUF makes atomicity guarantees. */
Following this, one can very well make that last plan of yours work. For the command to-be-sent not to be processed by the shell, it has to be in the form of a string when reaches the pipe (thus echo "command"
, not echo `command`
). Then it has to be read by a background process (alike a daemon, but not necessarily) started in the appropriate terminal. It should be evaluated by the same process.
But it is boiler-platey to have a script per pipe. So let's generalize making a script as term-pipe-r.sh
(don't forget to chmod +x
it!):
#!/bin/bash
pipe=$1 # the pipe name is the first argument
trap 'rm -f "$pipe"' EXIT # ignore exit and delete messages until the end
if [[ ! -p $pipe ]]; then # if the pipe doesn't exist, create it
mkfifo $pipe
fi
while true # cycle eternally..
do
if read line <$ pipe; then
if [[ "$line" == 'close the term-pipe pipe' ]]; then
break
# if the pipe closing message is received, break the while cycle
fi
echo # a line break should be used because of the prompt
eval $line # run the line: as this script should be started
fi # in the target terminal,
done # the line will be run there.
echo "<pipe closing message>" # custom message on the end of the script
So say you want /dev/tty3
to receive commands: just go there, do
./term-pipe-r.sh tty3pipe & # $1 will be tty3pipe (in a new process)
And to send commands, from any terminal (even from itself):
echo "command" > tty3pipe
or to run a file there:
cat some-script.sh > tty3pipe
Note this piping ignores files like .bashrc
, and the aliases in there, such as alias ls='ls --color'
. Hope this helps someone out there.
Edit (note - advantage of non-daemon):
Above I talked about the pipe reader not being a daemon necessarily, but in fact, I checked the differences, and it turns out it is way better to be a mere background process in this case. Because this way, when you close the terminal, an EXIT
signal (SIGHUP
, SIGTERM
, or whatever) is received by the script as well, and the pipe is deleted then (see the line starting with trap
in the script) automatically, avoiding a useless process and file (and maybe others if there were such redirecting to the useless pipe).
Edit (automation):
Still, it is boring to have to run a script you (I, at least) probably want most of the time. So, let's automatize it! It should start in any terminal, and one thing all of them read is .bashrc
. Plus, it sucks to have to use ./term-pipe-r.sh
. So, one may do:
cd /bin # go to /bin, where Bash gets command names
ln -s /directory/of/term-pipe-r.sh tpr # call it tpr (terminal pipe reader)
Now to run it you'd only need tpr tty3pipe &
in /dev/tty3
whenever you'd want. But why do that when you can have it done automatically? So this should be added to .bashrc
. But wait: how will it know the pipe name? It can base the name on the TTY (which can be know with the tty
command), using simple REGEX's in sed
(and some tricks). What you should add to ~/.bashrc
will then be:
pipe="$(sed 's/\/dev\///' <<< `tty` | sed 's/\///')pipe"
# ^^^- take out '/dev/' and other '/', add 'pipe'
tpr $pipe & # start our script with the appropriate pipe name
Best Answer
I would trace your Perl script with a system call trace tool:
strace
(Linux),dtruss
(OS X),ktrace
(FreeBSD),truss
(Solaris), etc. The goal would be to see how much time your Perl script spends waiting on reading from its stdin and how much time the other program spends waiting on writing to its stdout.Here I'm testing this out with the writer as the bottleneck:
The first number here is the time since the start of the previous syscall, and the last number is the time spent in the syscall. So we can post-process with Perl a bit to aggregate it... [*]
You could go fancier and make a SystemTap or DTrace script that does traces both sides at once, only tracks the correct file descriptor, and prints a nice status update every second or so with what percent of time each was waiting for the other.
[*] - Warning: my crude aggregation isn't quite right if read/write is being called on other file descriptors; it will underestimate the work time in that case.
The dtrace version is pretty neat actually.
And the SystemTap version: