I would trace your Perl script with a system call trace tool: strace
(Linux), dtruss
(OS X), ktrace
(FreeBSD), truss
(Solaris), etc. The goal would be to see how much time your Perl script spends waiting on reading from its stdin and how much time the other program spends waiting on writing to its stdout.
Here I'm testing this out with the writer as the bottleneck:
terminal 1$ gzip -c < /dev/urandom | cat > /dev/null
terminal 2$ ps auxw | egrep 'gzip|cat'
slamb 25311 96.0 0.0 2852 516 pts/0 R+ 23:35 3:40 gzip -c
slamb 25312 0.8 0.0 2624 336 pts/0 S+ 23:35 0:01 cat
terminal 2$ strace -p 25312 -s 0 -rT -e trace=read
Process 25312 attached - interrupt to quit
0.000000 read(0, ""..., 4096) = 4096 <0.005207>
0.005531 read(0, ""..., 4096) = 4096 <0.000051>
The first number here is the time since the start of the previous syscall, and the last number is the time spent in the syscall. So we can post-process with Perl a bit to aggregate it... [*]
terminal 2$ strace -p 25312 -s 0 -rT -e trace=read 2>&1 | perl -nle 'm{^\s*([\d.]+) read\(0, .*<([\d.]+)>} or next; $total_work += $1 - $last_wait; $total_wait += $2; $last_wait = $2; print "working for $total_work sec, waiting for $total_wait sec"; $last_wait = $2;'
working for 0 sec, waiting for 0.005592 sec
...
working for 0.305356 sec, waiting for 2.28624900000002 sec
...
terminal 2$ strace -p 25311 -s 0 -rT -e trace=write 2>&1 | perl -nle 'm{^\s*([\d.]+) write\(1, .*<([\d.]+)>} or next; $total_work += $1 - $last_wait; $total_wait += $2; $last_wait = $2; print "working for $total_work sec, waiting for $total_wait sec"; $last_wait = $2;'
...
working for 5.15862000000001 sec, waiting for 0.0555740000000007 sec
...
You could go fancier and make a SystemTap or DTrace script that does traces both sides at once, only tracks the correct file descriptor, and prints a nice status update every second or so with what percent of time each was waiting for the other.
[*] - Warning: my crude aggregation isn't quite right if read/write is being called on other file descriptors; it will underestimate the work time in that case.
The dtrace version is pretty neat actually.
terminal 1$ gzip -c < /dev/urandom | cat > /dev/null
terminal 2$ ps aux | egrep 'gzip| cat'
slamb 54189 95.8 0.0 591796 584 s006 R+ 12:49AM 22:49.55 gzip -c
slamb 54190 0.4 0.0 599828 392 s006 S+ 12:49AM 0:06.08 cat
terminal 2$ cat > pipe.d <<'EOF'
#!/usr/sbin/dtrace -qs
BEGIN
{
start = timestamp;
writer_blocked = 0;
reader_blocked = 0;
}
tick-1s, END
{
this->elapsed = timestamp - start;
printf("since startup, writer blocked %3d%% of time, reader %3d%% of time\n",
100 * writer_blocked / this->elapsed,
100 * reader_blocked / this->elapsed);
}
syscall::write:entry
/pid == $1 && arg0 == 1/
{
self->entry = timestamp;
}
syscall::write:return
/pid == $1 && self->entry != 0/
{
writer_blocked += timestamp - self->entry;
self->entry = 0;
}
syscall::read:entry
/pid == $2 && arg0 == 0/
{
self->entry = timestamp;
}
syscall::read:return
/pid == $2 && self->entry != 0/
{
reader_blocked += timestamp - self->entry;
self->entry = 0;
}
EOF
terminal 2$ chmod u+x pipe.d
terminal 2$ sudo ./pipe.d 54189 54190
since startup, writer blocked 0% of time, reader 98% of time
since startup, writer blocked 0% of time, reader 99% of time
since startup, writer blocked 0% of time, reader 99% of time
since startup, writer blocked 0% of time, reader 99% of time
since startup, writer blocked 0% of time, reader 99% of time
^C
since startup, writer blocked 0% of time, reader 99% of time
And the SystemTap version:
terminal 1$ gzip -c /dev/urandom | cat > /dev/null
terminal 2$ ps auxw | egrep 'gzip| cat'
slamb 3405 109 0.0 4356 584 pts/1 R+ 02:57 0:04 gzip -c /dev/urandom
slamb 3406 0.2 0.0 10848 588 pts/1 S+ 02:57 0:00 cat
terminal 2$ cat > probes.stp <<'EOF'
#!/usr/bin/env stap
global start
global writer_pid
global writes
global reader_pid
global reads
probe begin {
start = gettimeofday_us()
writer_pid = strtol(argv[1], 10)
reader_pid = strtol(argv[2], 10)
}
probe timer.s(1), end {
elapsed = gettimeofday_us() - start
printf("since startup, writer blocked %3d%% of time, reader %3d%% of time\n",
100 * @sum(writes) / elapsed,
100 * @sum(reads) / elapsed)
}
probe syscall.write.return {
if (pid() == writer_pid && $fd == 1)
writes <<< gettimeofday_us() - @entry(gettimeofday_us())
}
probe syscall.read.return {
if (pid() == reader_pid && $fd == 0)
reads <<< gettimeofday_us() - @entry(gettimeofday_us())
}
EOF
terminal 2$ chmod a+x probes.stp
terminal 2$ sudo ./pipe.stp 3405 3406
since startup, writer blocked 0% of time, reader 99% of time
...
No they're written to disk. The command mkfifo pipe21
creates the corresponding device on your filesystem. Often times these devices are kept under /dev
but named pipes (aka. FIFOS) don't necessarily have to be kept in this directory.
excerpt from wikipedia article
The named pipe can be deleted just like any file:
$ rm my_pipe
Example
Make a FIFO:
$ pwd
/home/saml
$ mkfifo pipe21
Check out the FIFO device:
$ ls -l | grep pipe
prw-rw-r-- 1 saml saml 0 Jul 24 12:22 pipe21
$ file pipe21
pipe21: fifo (named pipe)
Delete the device:
$ rm pipe21
$ ls -l | grep pipe
References
Best Answer
For a pipe, the end of file is seen by the consumer(s) once all the producers have closed their file descriptor to the pipe and the consumer has read all the data.
So, in:
cat
will see end-of-file as soon as the secondecho
terminates andcat
has read bothfoo\n
andbar\n
. There's nothing more for you to do.Things to bear in mind though is that if some of the commands on the left side of the pipe starts some background process, that background process will inherit a fd to the pipe (its stdout), so
cat
will not see eof until that process also dies or closes its stdout. As in:You see
cat
not returning before 10 seconds have passed.Here, you may want to redirect
sleep
's stdout to something else like/dev/null
if you don't want its (non)output to be fed tocat
:If you want the writing end of the pipe to be closed before the last command in the subshell left of the
|
is run, you can close stdout or redirecting to that subshell in the middle of the subshell withexec
, like:However note that most shells will still wait for that subshell in addition to the
cat
command. So while you'll seecat is now gone
straight away (afterfoo
is read), you'll still have to wait 10 seconds for the whole pipeline to finish. Of course, in that example above, it would make more sense to write it:<<ANYTHING...content...ANYTHING
is a here-document, it's to make the stdin of command a file that contains the content. It wouldn't be useful there.\4
is byte that when read from a terminal makes data held by a terminal device be flushed to the application reading from it (and when there's no data,read()
returns 0 which means end-of-file). Again, not of any use here.