How to pipe producer tell pipe consumer it has reached ‘End of File’?” (un-named-pipe, not named-pipe)

here-documenthere-stringpipe

I have an application which requires a producer to send filenames to a consumer, and have producer indicate to the consumer when the last filename has been sent and the end of file has been reached.

For simplicity, in the following example producer is demonstrated with echo and printf, while the consumer is demonstrated with cat. I have tried to extrapolate the "here file" method without success, using <<EOF to indicate to the producer-wrapper (if such a thing exists) what to look for as an indication of end of file. If it worked cat should filter EOF from the output.

Ex 1)

input

{
echo "Hello World!" 
printf '\x04' 
echo "EOF"
} <<EOF |\
cat

output

bash: warning: here-document at line 146 delimited by end-of-file (wanted `EOF')
Hello World!
EOF

Ex 2)

input

{ 
echo "Hello World!" 
printf '\x04' 
echo "EOF"
} |\
cat <<EOF

output

bash: warning: here-document at line 153 delimited by end-of-file (wanted `EOF')

Is it correct that the "here files" method for indicating delimiter only works for static text, and not dynamically created text?

— the actual application —

inotifywait -m --format '%w%f' /Dir |  <consumer>

The consumer is waiting for files to be written to directory /Dir.
It would be nice if when a file "/Dir/EOF" was written the consumer would detect logical end-of-file condition simply by writing shell script as follows:

inotifywait -m --format '%w%f' /Dir |<</Dir/EOF  <consumer>

— In response to Giles answer —

Is it theoretically possible to implement

cat <<EOF
hello
world
EOF

SpecialSymbol="EOF"
{
    echo hello
    echo world
    echo $SpecialSymbol
} |\
while read Line; do 
  if [[ $Line == $SpecialSymbol ]]
    break
  else 
    echo $Line
  fi
done |\
cat

By theoretically possible I mean "would it support existing usage patterns and only enable extra usage patterns which had previously been illegal syntax?" – meaning no existing legal code would be broken.

Best Answer

For a pipe, the end of file is seen by the consumer(s) once all the producers have closed their file descriptor to the pipe and the consumer has read all the data.

So, in:

{
  echo foo
  echo bar
} | cat

cat will see end-of-file as soon as the second echo terminates and cat has read both foo\n and bar\n. There's nothing more for you to do.

Things to bear in mind though is that if some of the commands on the left side of the pipe starts some background process, that background process will inherit a fd to the pipe (its stdout), so cat will not see eof until that process also dies or closes its stdout. As in:

{
  echo foo
  sleep 10 &
  echo bar
} | cat

You see cat not returning before 10 seconds have passed.

Here, you may want to redirect sleep's stdout to something else like /dev/null if you don't want its (non)output to be fed to cat:

{
  echo foo
  sleep 10 > /dev/null &
  echo bar
} | cat

If you want the writing end of the pipe to be closed before the last command in the subshell left of the | is run, you can close stdout or redirecting to that subshell in the middle of the subshell with exec, like:

{
  echo foo
  exec > /dev/null
  sleep 10
} | (cat; echo "cat is now gone")

However note that most shells will still wait for that subshell in addition to the cat command. So while you'll see cat is now gone straight away (after foo is read), you'll still have to wait 10 seconds for the whole pipeline to finish. Of course, in that example above, it would make more sense to write it:

echo foo | cat
sleep 10

<<ANYTHING...content...ANYTHING is a here-document, it's to make the stdin of command a file that contains the content. It wouldn't be useful there. \4 is byte that when read from a terminal makes data held by a terminal device be flushed to the application reading from it (and when there's no data, read() returns 0 which means end-of-file). Again, not of any use here.

Related Solutions

How to tell if the pipe buffer is full

I would trace your Perl script with a system call trace tool: strace (Linux), dtruss (OS X), ktrace (FreeBSD), truss (Solaris), etc. The goal would be to see how much time your Perl script spends waiting on reading from its stdin and how much time the other program spends waiting on writing to its stdout.

Here I'm testing this out with the writer as the bottleneck:

terminal 1$ gzip -c < /dev/urandom | cat > /dev/null

terminal 2$ ps auxw | egrep 'gzip|cat'
slamb    25311 96.0  0.0  2852  516 pts/0    R+   23:35   3:40 gzip -c
slamb    25312  0.8  0.0  2624  336 pts/0    S+   23:35   0:01 cat

terminal 2$ strace -p 25312 -s 0 -rT -e trace=read
Process 25312 attached - interrupt to quit
     0.000000 read(0, ""..., 4096) = 4096 <0.005207>
     0.005531 read(0, ""..., 4096) = 4096 <0.000051>

The first number here is the time since the start of the previous syscall, and the last number is the time spent in the syscall. So we can post-process with Perl a bit to aggregate it... [*]

terminal 2$ strace -p 25312 -s 0 -rT -e trace=read 2>&1 | perl -nle 'm{^\s*([\d.]+) read\(0, .*<([\d.]+)>} or next; $total_work += $1 - $last_wait; $total_wait += $2; $last_wait = $2; print "working for $total_work sec, waiting for $total_wait sec"; $last_wait = $2;'
working for 0 sec, waiting for 0.005592 sec
...
working for 0.305356 sec, waiting for 2.28624900000002 sec
...

terminal 2$ strace -p 25311 -s 0 -rT -e trace=write 2>&1 | perl -nle 'm{^\s*([\d.]+) write\(1, .*<([\d.]+)>} or next; $total_work += $1 - $last_wait; $total_wait += $2; $last_wait = $2; print "working for $total_work sec, waiting for $total_wait sec"; $last_wait = $2;'
...
working for 5.15862000000001 sec, waiting for 0.0555740000000007 sec
...

You could go fancier and make a SystemTap or DTrace script that does traces both sides at once, only tracks the correct file descriptor, and prints a nice status update every second or so with what percent of time each was waiting for the other.

[*] - Warning: my crude aggregation isn't quite right if read/write is being called on other file descriptors; it will underestimate the work time in that case.

The dtrace version is pretty neat actually.

terminal 1$ gzip -c < /dev/urandom | cat > /dev/null

terminal 2$ ps aux | egrep 'gzip| cat'
slamb    54189  95.8  0.0   591796    584 s006  R+   12:49AM  22:49.55 gzip -c
slamb    54190   0.4  0.0   599828    392 s006  S+   12:49AM   0:06.08 cat

terminal 2$ cat > pipe.d <<'EOF'
#!/usr/sbin/dtrace -qs

BEGIN
{
  start = timestamp;
  writer_blocked = 0;
  reader_blocked = 0;
}

tick-1s, END
{
  this->elapsed = timestamp - start;
  printf("since startup, writer blocked %3d%% of time, reader %3d%% of time\n",
         100 * writer_blocked / this->elapsed,
         100 * reader_blocked / this->elapsed);
}

syscall::write:entry
/pid == $1 && arg0 == 1/
{
  self->entry = timestamp;
}

syscall::write:return
/pid == $1 && self->entry != 0/
{
  writer_blocked += timestamp - self->entry;
  self->entry = 0;
}

syscall::read:entry
/pid == $2 && arg0 == 0/
{
  self->entry = timestamp;
}

syscall::read:return
/pid == $2 && self->entry != 0/
{
  reader_blocked += timestamp - self->entry;
  self->entry = 0;
}
EOF

terminal 2$ chmod u+x pipe.d
terminal 2$ sudo ./pipe.d 54189 54190
since startup, writer blocked   0% of time, reader  98% of time
since startup, writer blocked   0% of time, reader  99% of time
since startup, writer blocked   0% of time, reader  99% of time
since startup, writer blocked   0% of time, reader  99% of time
since startup, writer blocked   0% of time, reader  99% of time
^C
since startup, writer blocked   0% of time, reader  99% of time

And the SystemTap version:

terminal 1$ gzip -c /dev/urandom | cat > /dev/null

terminal 2$ ps auxw | egrep 'gzip| cat'
slamb     3405  109  0.0   4356   584 pts/1    R+   02:57   0:04 gzip -c /dev/urandom
slamb     3406  0.2  0.0  10848   588 pts/1    S+   02:57   0:00 cat

terminal 2$ cat > probes.stp <<'EOF'
#!/usr/bin/env stap

global start
global writer_pid
global writes
global reader_pid
global reads

probe begin {
  start = gettimeofday_us()
  writer_pid = strtol(argv[1], 10)
  reader_pid = strtol(argv[2], 10)
}

probe timer.s(1), end {
  elapsed = gettimeofday_us() - start
  printf("since startup, writer blocked %3d%% of time, reader %3d%% of time\n",
         100 * @sum(writes) / elapsed,
         100 * @sum(reads) / elapsed)
}

probe syscall.write.return {
  if (pid() == writer_pid && $fd == 1)
    writes <<< gettimeofday_us() - @entry(gettimeofday_us())
}

probe syscall.read.return {
  if (pid() == reader_pid && $fd == 0)
    reads <<< gettimeofday_us() - @entry(gettimeofday_us())
}
EOF

terminal 2$ chmod a+x probes.stp
terminal 2$ sudo ./pipe.stp 3405 3406
since startup, writer blocked   0% of time, reader  99% of time
...

Why does a named pipe not get deleted after system restart

No they're written to disk. The command mkfifo pipe21 creates the corresponding device on your filesystem. Often times these devices are kept under /dev but named pipes (aka. FIFOS) don't necessarily have to be kept in this directory.

excerpt from wikipedia article

The named pipe can be deleted just like any file:
$ rm my_pipe

Example

Make a FIFO:

$ pwd
/home/saml

$ mkfifo pipe21

Check out the FIFO device:

$ ls -l | grep pipe
prw-rw-r--   1 saml saml        0 Jul 24 12:22 pipe21

$ file pipe21 
pipe21: fifo (named pipe)

Delete the device:

$ rm pipe21 

$ ls -l | grep pipe

References

Named Pipes - Wikipedia