About your performance question, pipes are more efficient than files because no disk IO is needed. So cmd1 | cmd2
is more efficient than cmd1 > tmpfile; cmd2 < tmpfile
(this might not be true if tmpfile
is backed on a RAM disk or other memory device as named pipe; but if it is a named pipe, cmd1
should be run in the background as its output can block if the pipe becomes full). If you need the result of cmd1
and still need to send its output to cmd2
, you should cmd1 | tee tmpfile | cmd2
which will allow cmd1
and cmd2
to run in parallel avoiding disk read operations from cmd2
.
Named pipes are useful if many processes read/write to the same pipe. They can also be useful when a program is not designed to use stdin/stdout for its IO needing to use files. I put files in italic because named pipes are not exactly files in a storage point of view as they reside in memory and have a fixed buffer size, even if they have a filesystem entry (for reference purpose). Other things in UNIX have filesystem entries without being files: just think of /dev/null
or others entries in /dev
or /proc
.
As pipes (named and unnamed) have a fixed buffer size, read/write operations to them can block, causing the reading/writing process to go in IOWait state. Also, when do you receive an EOF when reading from a memory buffer ? Rules on this behavior are well defined and can be found in the man.
One thing you cannot do with pipes (named and unnamed) is seek back in the data. As they are implemented using a memory buffer, this is understandable.
About "everything in Linux/Unix is a file"
, I do not agree. Named pipes have filesystem entries, but are not exactly file. Unnamed pipes do not have filesystem entries (except maybe in /proc
). However, most IO operations on UNIX are done using read/write function that need a file descriptor, including unnamed pipe (and socket). I do not think that we can say that "everything in Linux/Unix is a file"
, but we can surely say that "most IO in Linux/Unix is done using a file descriptor"
.
Seekable pipes have been proposed for the Linux kernel, but I'm not aware of a working patch to implement them.
You could use an LD_PRELOAD
'ed library that overrides the lseek
call on specific files. I don't know of any off-the-shelf wrapper for this purpose. Shadowfs might help in writing one.
Best Answer
Back when Unix was created, disks were very small, and it was common for a rather benign command to consume all the free space in a file system. For example,
produces output that's much smaller than the size of the output of the first command (i.e., the size of the intermediate file that would be created if you ran the commands the way you're running your programs).
Data flowing through pipes and sockets is (probably) not written to disk at all. Therefore, these IPC solutions may be