Shell Pipes – How to Understand Pipes

Architecturepipeshellsystem-calls

When I just used pipe in bash, I didn't think more about this. But when I read some C code example using system call pipe() together with fork(), I wonder how to understand pipes, including both anonymous pipes and named pipes.

It is often heard that "everything in Linux/Unix is a file". I wonder if a pipe is actually a file so that one part it connects writes to the pipe file, and the other part reads from the pipe file? If yes, where is the pipe file for an anonymous pipe created? In /tmp, /dev, or …?

However, from examples of named pipes, I also learned that using pipes has space and time performance advantage over explicitly using temporary files, probably because there are no files involved in implementation of pipes. Also pipes seem not store data as files do. So I doubt a pipe is actually a file.

Best Answer

About your performance question, pipes are more efficient than files because no disk IO is needed. So cmd1 | cmd2 is more efficient than cmd1 > tmpfile; cmd2 < tmpfile (this might not be true if tmpfile is backed on a RAM disk or other memory device as named pipe; but if it is a named pipe, cmd1 should be run in the background as its output can block if the pipe becomes full). If you need the result of cmd1 and still need to send its output to cmd2, you should cmd1 | tee tmpfile | cmd2 which will allow cmd1 and cmd2 to run in parallel avoiding disk read operations from cmd2.

Named pipes are useful if many processes read/write to the same pipe. They can also be useful when a program is not designed to use stdin/stdout for its IO needing to use files. I put files in italic because named pipes are not exactly files in a storage point of view as they reside in memory and have a fixed buffer size, even if they have a filesystem entry (for reference purpose). Other things in UNIX have filesystem entries without being files: just think of /dev/null or others entries in /dev or /proc.

As pipes (named and unnamed) have a fixed buffer size, read/write operations to them can block, causing the reading/writing process to go in IOWait state. Also, when do you receive an EOF when reading from a memory buffer ? Rules on this behavior are well defined and can be found in the man.

One thing you cannot do with pipes (named and unnamed) is seek back in the data. As they are implemented using a memory buffer, this is understandable.

About "everything in Linux/Unix is a file", I do not agree. Named pipes have filesystem entries, but are not exactly file. Unnamed pipes do not have filesystem entries (except maybe in /proc). However, most IO operations on UNIX are done using read/write function that need a file descriptor, including unnamed pipe (and socket). I do not think that we can say that "everything in Linux/Unix is a file", but we can surely say that "most IO in Linux/Unix is done using a file descriptor".

Related Solutions

How to pipe data over network or serial to the display of another linux machine

netcat springs to mind; it may be the more sensible choice (given the no-overhead, no compression approach to network communications) on your low-spec receiving machine.

A nice usage example can be found here:
https://stackoverflow.com/questions/4113986/example-of-using-named-pipes-in-linux-bash

Opening named pipe blocks forever, if pipe is deleted without being connected

As suggested by Julie Pelletier, I'm making this answer about the workaround we found in our discussion.

You cannot easily identify deadlocked situations as described in my question, but you can pre-emptively vent the named pipe as a workaround before anyone deletes it (if you really cannot avoid such a deletion, as in my case). This venting should allow any writer currently blocked trying to open the pipe, to succeed with the open operation but fail during the actual write operation (-> broken pipe). Failing might be better than a deadlock. In order to not immediately run into the next deadlock, you should move the pipe before venting it, and delete it afterwards.

# rename the pipe, i.e. move it out of the way
mv -f /tmp/test.pipe /tmp/test.pipe~ 2>/dev/null

# vent the pipe, i.e. shortly open it for reading but don't read from it.
# call the subshell dd and empty echo calls in the background to avoid 
# deadlocking on redundant venting.
(dd if=/tmp/test.pipe~ count=0 2>/dev/null & echo -n "" >/tmp/test.pipe~ &)

# delete the old pipe
rm -f /tmp/test.pipe~

If you know which program executes the faulty deletion, you might want to wrap that program into a small script, that does the venting and forgoes the deletion.

Best Answer

Related Solutions

How to pipe data over network or serial to the display of another linux machine

Opening named pipe blocks forever, if pipe is deleted without being connected

Related Question