Shell Pipes – How to Understand Pipes

Architecturepipeshellsystem-calls

When I just used pipe in bash, I didn't think more about this. But when I read some C code example using system call pipe() together with fork(), I wonder how to understand pipes, including both anonymous pipes and named pipes.

It is often heard that "everything in Linux/Unix is a file". I wonder if a pipe is actually a file so that one part it connects writes to the pipe file, and the other part reads from the pipe file? If yes, where is the pipe file for an anonymous pipe created? In /tmp, /dev, or …?

However, from examples of named pipes, I also learned that using pipes has space and time performance advantage over explicitly using temporary files, probably because there are no files involved in implementation of pipes. Also pipes seem not store data as files do. So I doubt a pipe is actually a file.

Best Answer

About your performance question, pipes are more efficient than files because no disk IO is needed. So cmd1 | cmd2 is more efficient than cmd1 > tmpfile; cmd2 < tmpfile (this might not be true if tmpfile is backed on a RAM disk or other memory device as named pipe; but if it is a named pipe, cmd1 should be run in the background as its output can block if the pipe becomes full). If you need the result of cmd1 and still need to send its output to cmd2, you should cmd1 | tee tmpfile | cmd2 which will allow cmd1 and cmd2 to run in parallel avoiding disk read operations from cmd2.

Named pipes are useful if many processes read/write to the same pipe. They can also be useful when a program is not designed to use stdin/stdout for its IO needing to use files. I put files in italic because named pipes are not exactly files in a storage point of view as they reside in memory and have a fixed buffer size, even if they have a filesystem entry (for reference purpose). Other things in UNIX have filesystem entries without being files: just think of /dev/null or others entries in /dev or /proc.

As pipes (named and unnamed) have a fixed buffer size, read/write operations to them can block, causing the reading/writing process to go in IOWait state. Also, when do you receive an EOF when reading from a memory buffer ? Rules on this behavior are well defined and can be found in the man.

One thing you cannot do with pipes (named and unnamed) is seek back in the data. As they are implemented using a memory buffer, this is understandable.

About "everything in Linux/Unix is a file", I do not agree. Named pipes have filesystem entries, but are not exactly file. Unnamed pipes do not have filesystem entries (except maybe in /proc). However, most IO operations on UNIX are done using read/write function that need a file descriptor, including unnamed pipe (and socket). I do not think that we can say that "everything in Linux/Unix is a file", but we can surely say that "most IO in Linux/Unix is done using a file descriptor".

Related Question