How to tell if the pipe buffer is full

pipe

I am piping output from one program into some Perl I wrote. This is a long running process , sometimes days, so I want to find out where my bottlenecks are and try to open them up. I want to know if data is being piped into my script faster than my script can process it. If this is the case I will try to tune up my script, but not if I don't need to. I see talk about a flag being set when the buffer is full that prevents any more writes to it but how do I check to see if or how often that flag is set? Any ideas?

Best Answer

I would trace your Perl script with a system call trace tool: strace (Linux), dtruss (OS X), ktrace (FreeBSD), truss (Solaris), etc. The goal would be to see how much time your Perl script spends waiting on reading from its stdin and how much time the other program spends waiting on writing to its stdout.

Here I'm testing this out with the writer as the bottleneck:

terminal 1$ gzip -c < /dev/urandom | cat > /dev/null

terminal 2$ ps auxw | egrep 'gzip|cat'
slamb    25311 96.0  0.0  2852  516 pts/0    R+   23:35   3:40 gzip -c
slamb    25312  0.8  0.0  2624  336 pts/0    S+   23:35   0:01 cat

terminal 2$ strace -p 25312 -s 0 -rT -e trace=read
Process 25312 attached - interrupt to quit
     0.000000 read(0, ""..., 4096) = 4096 <0.005207>
     0.005531 read(0, ""..., 4096) = 4096 <0.000051>

The first number here is the time since the start of the previous syscall, and the last number is the time spent in the syscall. So we can post-process with Perl a bit to aggregate it... [*]

terminal 2$ strace -p 25312 -s 0 -rT -e trace=read 2>&1 | perl -nle 'm{^\s*([\d.]+) read\(0, .*<([\d.]+)>} or next; $total_work += $1 - $last_wait; $total_wait += $2; $last_wait = $2; print "working for $total_work sec, waiting for $total_wait sec"; $last_wait = $2;'
working for 0 sec, waiting for 0.005592 sec
...
working for 0.305356 sec, waiting for 2.28624900000002 sec
...

terminal 2$ strace -p 25311 -s 0 -rT -e trace=write 2>&1 | perl -nle 'm{^\s*([\d.]+) write\(1, .*<([\d.]+)>} or next; $total_work += $1 - $last_wait; $total_wait += $2; $last_wait = $2; print "working for $total_work sec, waiting for $total_wait sec"; $last_wait = $2;'
...
working for 5.15862000000001 sec, waiting for 0.0555740000000007 sec
...

You could go fancier and make a SystemTap or DTrace script that does traces both sides at once, only tracks the correct file descriptor, and prints a nice status update every second or so with what percent of time each was waiting for the other.

[*] - Warning: my crude aggregation isn't quite right if read/write is being called on other file descriptors; it will underestimate the work time in that case.

The dtrace version is pretty neat actually.

terminal 1$ gzip -c < /dev/urandom | cat > /dev/null

terminal 2$ ps aux | egrep 'gzip| cat'
slamb    54189  95.8  0.0   591796    584 s006  R+   12:49AM  22:49.55 gzip -c
slamb    54190   0.4  0.0   599828    392 s006  S+   12:49AM   0:06.08 cat

terminal 2$ cat > pipe.d <<'EOF'
#!/usr/sbin/dtrace -qs

BEGIN
{
  start = timestamp;
  writer_blocked = 0;
  reader_blocked = 0;
}

tick-1s, END
{
  this->elapsed = timestamp - start;
  printf("since startup, writer blocked %3d%% of time, reader %3d%% of time\n",
         100 * writer_blocked / this->elapsed,
         100 * reader_blocked / this->elapsed);
}

syscall::write:entry
/pid == $1 && arg0 == 1/
{
  self->entry = timestamp;
}

syscall::write:return
/pid == $1 && self->entry != 0/
{
  writer_blocked += timestamp - self->entry;
  self->entry = 0;
}

syscall::read:entry
/pid == $2 && arg0 == 0/
{
  self->entry = timestamp;
}

syscall::read:return
/pid == $2 && self->entry != 0/
{
  reader_blocked += timestamp - self->entry;
  self->entry = 0;
}
EOF

terminal 2$ chmod u+x pipe.d
terminal 2$ sudo ./pipe.d 54189 54190
since startup, writer blocked   0% of time, reader  98% of time
since startup, writer blocked   0% of time, reader  99% of time
since startup, writer blocked   0% of time, reader  99% of time
since startup, writer blocked   0% of time, reader  99% of time
since startup, writer blocked   0% of time, reader  99% of time
^C
since startup, writer blocked   0% of time, reader  99% of time

And the SystemTap version:

terminal 1$ gzip -c /dev/urandom | cat > /dev/null

terminal 2$ ps auxw | egrep 'gzip| cat'
slamb     3405  109  0.0   4356   584 pts/1    R+   02:57   0:04 gzip -c /dev/urandom
slamb     3406  0.2  0.0  10848   588 pts/1    S+   02:57   0:00 cat

terminal 2$ cat > probes.stp <<'EOF'
#!/usr/bin/env stap

global start
global writer_pid
global writes
global reader_pid
global reads

probe begin {
  start = gettimeofday_us()
  writer_pid = strtol(argv[1], 10)
  reader_pid = strtol(argv[2], 10)
}

probe timer.s(1), end {
  elapsed = gettimeofday_us() - start
  printf("since startup, writer blocked %3d%% of time, reader %3d%% of time\n",
         100 * @sum(writes) / elapsed,
         100 * @sum(reads) / elapsed)
}

probe syscall.write.return {
  if (pid() == writer_pid && $fd == 1)
    writes <<< gettimeofday_us() - @entry(gettimeofday_us())
}

probe syscall.read.return {
  if (pid() == reader_pid && $fd == 0)
    reads <<< gettimeofday_us() - @entry(gettimeofday_us())
}
EOF

terminal 2$ chmod a+x probes.stp
terminal 2$ sudo ./pipe.stp 3405 3406
since startup, writer blocked   0% of time, reader  99% of time
...

Related Solutions

Pipe – How Big is the Pipe Buffer?

The capacity of a pipe buffer varies across systems (and can even vary on the same system). I am not sure there is a quick, easy, and cross platform way to just lookup the capacity of a pipe.

Mac OS X, for example, uses a capacity of 16384 bytes by default, but can switch to 65336 byte capacities if large write are made to the pipe, or will switch to a capacity of a single system page if too much kernel memory is already being used by pipe buffers (see xnu/bsd/sys/pipe.h, and xnu/bsd/kern/sys_pipe.c; since these are from FreeBSD, the same behavior may happen there, too).

One Linux pipe(7) man page says that pipe capacity is 65536 bytes since Linux 2.6.11 and a single system page prior to that (e.g. 4096 bytes on (32-bit) x86 systems). The code (include/linux/pipe_fs_i.h, and fs/pipe.c) seems to use 16 system pages (i.e. 64 KiB if a system page is 4 KiB), but the buffer for each pipe can be adjusted via a fcntl on the pipe (up to a maximum capacity which defaults to 1048576 bytes, but can be changed via /proc/sys/fs/pipe-max-size)).

Here is a little bash/perl combination that I used to test the pipe capacity on my system:

#!/bin/bash
test $# -ge 1 || { echo "usage: $0 write-size [wait-time]"; exit 1; }
test $# -ge 2 || set -- "$@" 1
bytes_written=$(
{
    exec 3>&1
    {
        perl -e '
            $size = $ARGV[0];
            $block = q(a) x $size;
            $num_written = 0;
            sub report { print STDERR $num_written * $size, qq(\n); }
            report; while (defined syswrite STDOUT, $block) {
                $num_written++; report;
            }
        ' "$1" 2>&3
    } | (sleep "$2"; exec 0<&-);
} | tail -1
)
printf "write size: %10d; bytes successfully before error: %d\n" \
    "$1" "$bytes_written"

Here is what I found running it with various write sizes on a Mac OS X 10.6.7 system (note the change for writes larger than 16KiB):

% /bin/bash -c 'for p in {0..18}; do /tmp/ts.sh $((2 ** $p)) 0.5; done'
write size:          1; bytes successfully before error: 16384
write size:          2; bytes successfully before error: 16384
write size:          4; bytes successfully before error: 16384
write size:          8; bytes successfully before error: 16384
write size:         16; bytes successfully before error: 16384
write size:         32; bytes successfully before error: 16384
write size:         64; bytes successfully before error: 16384
write size:        128; bytes successfully before error: 16384
write size:        256; bytes successfully before error: 16384
write size:        512; bytes successfully before error: 16384
write size:       1024; bytes successfully before error: 16384
write size:       2048; bytes successfully before error: 16384
write size:       4096; bytes successfully before error: 16384
write size:       8192; bytes successfully before error: 16384
write size:      16384; bytes successfully before error: 16384
write size:      32768; bytes successfully before error: 65536
write size:      65536; bytes successfully before error: 65536
write size:     131072; bytes successfully before error: 0
write size:     262144; bytes successfully before error: 0

The same script on Linux 3.19:

/bin/bash -c 'for p in {0..18}; do /tmp/ts.sh $((2 ** $p)) 0.5; done'
write size:          1; bytes successfully before error: 65536
write size:          2; bytes successfully before error: 65536
write size:          4; bytes successfully before error: 65536
write size:          8; bytes successfully before error: 65536
write size:         16; bytes successfully before error: 65536
write size:         32; bytes successfully before error: 65536
write size:         64; bytes successfully before error: 65536
write size:        128; bytes successfully before error: 65536
write size:        256; bytes successfully before error: 65536
write size:        512; bytes successfully before error: 65536
write size:       1024; bytes successfully before error: 65536
write size:       2048; bytes successfully before error: 65536
write size:       4096; bytes successfully before error: 65536
write size:       8192; bytes successfully before error: 65536
write size:      16384; bytes successfully before error: 65536
write size:      32768; bytes successfully before error: 65536
write size:      65536; bytes successfully before error: 65536
write size:     131072; bytes successfully before error: 0
write size:     262144; bytes successfully before error: 0

Note: The PIPE_BUF value defined in the C header files (and the pathconf value for _PC_PIPE_BUF), does not specify the capacity of pipes, but the maximum number of bytes that can be written atomically (see POSIX write(2)).

Quote from include/linux/pipe_fs_i.h:

/* Differs from PIPE_BUF in that PIPE_SIZE is the length of the actual
   memory allocation, whereas PIPE_BUF makes atomicity guarantees.  */

How to pipe commands to any terminal

Following this, one can very well make that last plan of yours work. For the command to-be-sent not to be processed by the shell, it has to be in the form of a string when reaches the pipe (thus echo "command", not echo `command`). Then it has to be read by a background process (alike a daemon, but not necessarily) started in the appropriate terminal. It should be evaluated by the same process.

But it is boiler-platey to have a script per pipe. So let's generalize making a script as term-pipe-r.sh (don't forget to chmod +x it!):

#!/bin/bash

pipe=$1                     # the pipe name is the first argument
trap 'rm -f "$pipe"' EXIT     # ignore exit and delete messages until the end

if [[ ! -p $pipe ]]; then   # if the pipe doesn't exist, create it
    mkfifo $pipe
fi

while true                  # cycle eternally..
do
    if read line <$ pipe; then
        if [[ "$line" == 'close the term-pipe pipe' ]]; then
            break
            # if the pipe closing message is received, break the while cycle
        fi

        echo                # a line break should be used because of the prompt 
        eval $line          # run the line: as this script should be started
    fi                          # in the target terminal, 
done                            # the line will be run there.

echo "<pipe closing message>"   # custom message on the end of the script

So say you want /dev/tty3 to receive commands: just go there, do

./term-pipe-r.sh tty3pipe &     # $1 will be tty3pipe (in a new process)

And to send commands, from any terminal (even from itself):

echo "command" > tty3pipe

or to run a file there:

cat some-script.sh > tty3pipe

Note this piping ignores files like .bashrc, and the aliases in there, such as alias ls='ls --color'. Hope this helps someone out there.

Edit (note - advantage of non-daemon):

Above I talked about the pipe reader not being a daemon necessarily, but in fact, I checked the differences, and it turns out it is way better to be a mere background process in this case. Because this way, when you close the terminal, an EXIT signal (SIGHUP, SIGTERM, or whatever) is received by the script as well, and the pipe is deleted then (see the line starting with trap in the script) automatically, avoiding a useless process and file (and maybe others if there were such redirecting to the useless pipe).

Edit (automation):

Still, it is boring to have to run a script you (I, at least) probably want most of the time. So, let's automatize it! It should start in any terminal, and one thing all of them read is .bashrc. Plus, it sucks to have to use ./term-pipe-r.sh. So, one may do:

cd /bin # go to /bin, where Bash gets command names
ln -s /directory/of/term-pipe-r.sh tpr  # call it tpr (terminal pipe reader)

Now to run it you'd only need tpr tty3pipe & in /dev/tty3 whenever you'd want. But why do that when you can have it done automatically? So this should be added to .bashrc. But wait: how will it know the pipe name? It can base the name on the TTY (which can be know with the tty command), using simple REGEX's in sed (and some tricks). What you should add to ~/.bashrc will then be:

pipe="$(sed 's/\/dev\///' <<< `tty` | sed 's/\///')pipe"
                # ^^^- take out '/dev/' and other '/', add 'pipe'
tpr $pipe &     # start our script with the appropriate pipe name