Linux Kernel – Differences Between FIFO, Pipe & Unix Domain Socket

fifoipclinux-kernelpipesocket

I heard that FIFOs are named pipes. And they have exactly the same semantics. On the other hand, I think Unix domain socket is quite similar to pipe (although I've never made use of it). So I wonder if they all refer to the same implementation in Linux kernel. Any idea?

Best Answer

UNIX domain sockets and FIFO may share some part of their implementation but they are conceptually very different. FIFO functions at a very low level. One process writes bytes into the pipe and another one reads from it. A UNIX domain socket has the same behaviour as a TCP/IP socket.

A socket is bidirectional and can be used by a lot of processes simultaneously. A process can accept many connections on the same socket and attend several clients simultaneously. The kernel delivers a new file descriptor each time connect(2) or accept(2) is called on the socket. The packets will always go to the right process.
On a FIFO, this would be impossible. For bidirectional comunication, you need two FIFOs, and you need a pair of FIFOs for each of your clients. There is no way of writing or reading in a selective way, because they are a much more primitive way to communicate.

Anonymous pipes and FIFOs are very similar. The difference is that anonymous pipes don't exist as files on the filesystem so no process can open(2) it. They are used by processes that share them by another method. If a process opens a FIFOs and then performs, for example, a fork(2), its child will inherit its file descriptors and, among them, the pipe.

The UNIX domain sockets, anonymous pipes and FIFOs are similar in the fact they use shared memory segments. The details of implementation may vary from one system to another but the idea is always the same: ~~attach the same portion of memory in two distinct processes memory mapping to have them sharing data~~
(edit: that would one obvious way to implement it but that is not how it is actually done in Linux, which simply uses the kernel memory for the buffers, see answer by @tjb63 below).
The kernel then handles the system calls and abstracts the mechanism.

Related Solutions

Shell Pipes – How to Understand Pipes

About your performance question, pipes are more efficient than files because no disk IO is needed. So cmd1 | cmd2 is more efficient than cmd1 > tmpfile; cmd2 < tmpfile (this might not be true if tmpfile is backed on a RAM disk or other memory device as named pipe; but if it is a named pipe, cmd1 should be run in the background as its output can block if the pipe becomes full). If you need the result of cmd1 and still need to send its output to cmd2, you should cmd1 | tee tmpfile | cmd2 which will allow cmd1 and cmd2 to run in parallel avoiding disk read operations from cmd2.

Named pipes are useful if many processes read/write to the same pipe. They can also be useful when a program is not designed to use stdin/stdout for its IO needing to use files. I put files in italic because named pipes are not exactly files in a storage point of view as they reside in memory and have a fixed buffer size, even if they have a filesystem entry (for reference purpose). Other things in UNIX have filesystem entries without being files: just think of /dev/null or others entries in /dev or /proc.

As pipes (named and unnamed) have a fixed buffer size, read/write operations to them can block, causing the reading/writing process to go in IOWait state. Also, when do you receive an EOF when reading from a memory buffer ? Rules on this behavior are well defined and can be found in the man.

One thing you cannot do with pipes (named and unnamed) is seek back in the data. As they are implemented using a memory buffer, this is understandable.

About "everything in Linux/Unix is a file", I do not agree. Named pipes have filesystem entries, but are not exactly file. Unnamed pipes do not have filesystem entries (except maybe in /proc). However, most IO operations on UNIX are done using read/write function that need a file descriptor, including unnamed pipe (and socket). I do not think that we can say that "everything in Linux/Unix is a file", but we can surely say that "most IO in Linux/Unix is done using a file descriptor".

Problem with pipes. Pipe terminates when reader done

As for the cause, use strace.

tail -f | strace bash >> foo

The second echo echo hello > pToB gives me then this:

rt_sigprocmask(SIG_BLOCK, NULL, [], 8)  = 0
read(0, "e", 1)                         = 1
read(0, "c", 1)                         = 1
read(0, "h", 1)                         = 1
read(0, "o", 1)                         = 1
read(0, " ", 1)                         = 1
read(0, "h", 1)                         = 1
read(0, "e", 1)                         = 1
read(0, "l", 1)                         = 1
read(0, "l", 1)                         = 1
read(0, "o", 1)                         = 1
read(0, "\n", 1)                        = 1
write(1, "hello\n", 6)                  = -1 EPIPE (Broken pipe)
--- SIGPIPE {si_signo=SIGPIPE, si_code=SI_USER, si_pid=3299, si_uid=1000} ---
+++ killed by SIGPIPE +++

So, the second time it tries to write hello\n, it gets a broken pipe error; that's why you can't read hello (it was never written), and bash quits so that's the end of it.

You'd have to use something that keeps the pipe open, I guess.

How about this?

(while read myline; do echo $myline; done) < pToP

For more background information, man 7 pipe may be relevant, it describes the various error cases around pipes.

Best Answer

Related Solutions

Shell Pipes – How to Understand Pipes

Problem with pipes. Pipe terminates when reader done

Related Question