About your performance question, pipes are more efficient than files because no disk IO is needed. So cmd1 | cmd2
is more efficient than cmd1 > tmpfile; cmd2 < tmpfile
(this might not be true if tmpfile
is backed on a RAM disk or other memory device as named pipe; but if it is a named pipe, cmd1
should be run in the background as its output can block if the pipe becomes full). If you need the result of cmd1
and still need to send its output to cmd2
, you should cmd1 | tee tmpfile | cmd2
which will allow cmd1
and cmd2
to run in parallel avoiding disk read operations from cmd2
.
Named pipes are useful if many processes read/write to the same pipe. They can also be useful when a program is not designed to use stdin/stdout for its IO needing to use files. I put files in italic because named pipes are not exactly files in a storage point of view as they reside in memory and have a fixed buffer size, even if they have a filesystem entry (for reference purpose). Other things in UNIX have filesystem entries without being files: just think of /dev/null
or others entries in /dev
or /proc
.
As pipes (named and unnamed) have a fixed buffer size, read/write operations to them can block, causing the reading/writing process to go in IOWait state. Also, when do you receive an EOF when reading from a memory buffer ? Rules on this behavior are well defined and can be found in the man.
One thing you cannot do with pipes (named and unnamed) is seek back in the data. As they are implemented using a memory buffer, this is understandable.
About "everything in Linux/Unix is a file"
, I do not agree. Named pipes have filesystem entries, but are not exactly file. Unnamed pipes do not have filesystem entries (except maybe in /proc
). However, most IO operations on UNIX are done using read/write function that need a file descriptor, including unnamed pipe (and socket). I do not think that we can say that "everything in Linux/Unix is a file"
, but we can surely say that "most IO in Linux/Unix is done using a file descriptor"
.
In
./binary < file
binary
's stdin is the file open in read-only mode. Note that bash
doesn't read the file at all, it just opens it for reading on the file descriptor 0 (stdin) of the process it executes binary
in.
In:
./binary << EOF
test
EOF
Depending on the shell, binary
's stdin will be either a deleted temporary file (AT&T ksh, zsh, bash...) that contains test\n
as put there by the shell or the reading end of a pipe (dash
, yash
; and the shell writes test\n
in parallel at the other end of the pipe). In your case, if you're using bash
, it would be a temp file.
In:
cat file | ./binary
Depending on the shell, binary
's stdin will be either the reading end of a pipe, or one end of a socket pair where the writing direction has been shut down (ksh93) and cat
is writing the content of file
at the other end.
When stdin is a regular file (temporary or not), it is seekable. binary
may go to the beginning or end, rewind, etc. It can also mmap it, do some ioctl()s
like FIEMAP/FIBMAP (if using <>
instead of <
, it could truncate/punch holes in it, etc).
pipes and socket pairs on the other hand are an inter-process communication means, there's not much binary
can do beside read
ing the data (though there are also some operations like some pipe-specific ioctl()
s that it could do on them and not on regular files).
Most of the times, it's the missing ability to seek
that causes applications to fail/complain when working with pipes, but it could be any of the other system calls that are valid on regular files but not on different types of files (like mmap()
, ftruncate()
, fallocate()
). On Linux, there's also a big difference in behaviour when you open /dev/stdin
while the fd 0 is on a pipe or on a regular file.
There are many commands out there that can only deal with seekable files, but when that's the case, that's generally not for the files open on their stdin.
$ unzip -l file.zip
Archive: file.zip
Length Date Time Name
--------- ---------- ----- ----
11 2016-12-21 14:43 file
--------- -------
11 1 file
$ unzip -l <(cat file.zip)
# more or less the same as cat file.zip | unzip -l /dev/stdin
Archive: /proc/self/fd/11
End-of-central-directory signature not found. Either this file is not
a zipfile, or it constitutes one disk of a multi-part archive. In the
latter case the central directory and zipfile comment will be found on
the last disk(s) of this archive.
unzip: cannot find zipfile directory in one of /proc/self/fd/11 or
/proc/self/fd/11.zip, and cannot find /proc/self/fd/11.ZIP, period.
unzip
needs to read the index stored at the end of the file, and then seek within the file to read the archive members. But here, the file (regular in the first case, pipe in the second) is given as a path argument to unzip
, and unzip
opens it itself (typically on fd other than 0) instead of inheriting a fd already opened by the caller. It doesn't read zip files from its stdin. stdin is mostly used for user interaction.
If you run that binary
of yours without redirection at the prompt of an interactive shell running in a terminal emulator, then binary
's stdin will be inherited from its caller the shell, which itself will have inherited it from its caller the terminal emulator and will be a pty device open in read+write mode (something like /dev/pts/n
).
Those devices are not seekable either. So, if binary
works OK when taking input from the terminal, possibly the issue is not about seeking.
If that 14 is meant to be an errno (an error code set by failing system calls), then on most systems, that would be EFAULT
(Bad address). The read()
system call would fail with that error if asked to read into a memory address that is not writable. That would be independent of whether the fd to read the data from points to a pipe or regular file and would generally indicate a bug1.
binary
possibly determines the type of file open on its stdin (with fstat()
) and runs into a bug when it's neither a regular file nor a tty device.
Hard to tell without knowing more about the application. Running it under strace
(or truss
/tusc
equivalent on your system) could help us see what is the system call if any that is failing here.
1 The scenario envisaged by Matthew Ife in a comment to your question sounds a lot plausible here. Quoting him:
I suspect it is seeking to the end of file to get a buffer size for reading the data, badly handling the fact that seek doesn't work and attempting to allocate a negative size (not handling a bad malloc). Passing the buffer to read which faults given the buffer is not valid.
Best Answer
No, the programs that reject those files usually reject them on the ground that the file is not seekable (they need to access the content at arbitrary offsets, or several times after rewinding etc.). Or they would want to open the file several times. They may also want to rewrite (part of) the file or truncate it.
Unnamed
pipe
(like with|
and/dev/stdin
) or named ones make no difference in any of those cases.Actually, on Linux,
/dev/stdin
when stdin is pipe (named or not) behaves exactly like a named pipe, the program would not be able to differentiate that/dev/stdin
from a real named pipe.On other systems, it's not exactly the same, but in effect, opening
/dev/stdin
or a named pipe will get you a file descriptor to a pipe, something that is not seekable either way.So, you'll need to create the temporary file. Note that some shells make it easier. With
zsh
, it's just:On Linux and With shells that use a deleted temporary files for here documents (like
bash
,zsh
and some implementations ofksh
), you can do:However, that may mangle the contents of the file if it contains NUL characters or ends in empty lines.
Note that since version 5, bash makes the here doc temporary file read-only, so if the application needs to make modifications to that file, you'll to restore the write permissions with:
A note about that
while read
loop since you asked.First
read -r
without a variable name is not validsh
syntax. Thesh
syntax is specified by POSIX (ISO 9945, also IEEE Std 1003.1) like theC
syntax is specified by ISO 9899.In that specification, you'll notice that
read
requires a variable name argument. The behaviour when you omit it is unspecified and in practice vary with thesh
interpreter implementation.bash
is the GNUsh
interpreter, likegcc
is the GNU C compiler. Bothbash
andgcc
have extensions over what those standards specify.In the case of
read
,bash
treatsread -r
as if it wasIFS= read -r REPLY
. In the POSIX spec,IFS= read -r REPLY
reads stdin until either a\n
character or the end of input is reached and stores the read characters into the$REPLY
variable and returns with a success exit status if a newline character was read (a full line) or failure otherwise (like EOF before the newline) and leaves the behaviour undefined if the read data contains NUL characters or sequences of bytes that don't form valid characters.In the case of
bash
, it will store the bytes read even if they don't form valid characters and removes the NUL characters.read -r
is likeread -r REPLY
inksh
orzsh
and reports an error inyash
orash
-based POSIX-like shells.The behaviour of
echo
is unspecified unless its arguments don't contain backslash characters and the first one is not-n
.So, to sum up, unless you know the particular
sh
implementation (and version) you're dealing with, you can't tell whatwill do. In the case of
bash
specifically, it will store stdin into the temp_file only as long as the data doesn't contain NUL characters, ends in a newline character and none of the lines matches the^-[neE]+$
extended regular expression (and/or depending on the environment or howbash
was compiled like thesh
of OS/X, doesn't contain backslash characters).It's also very inefficient and not the way you process text in shells.
Here, you want:
cat
is a standard command, which when not given any argument just dumps its stdin onto its stdout as-is.