Linux – Creating temp file vs process substitution vs variable expansion

linuxprocess-substitutionvariable

If I am doing something like

creating temporary file

some process generating output > temp_file
cat  temp_file

process substitution:
```
cat <(some process generating output)
```

another way :

cat <<<(some process generating output)

I have some doubts regarding these:

Is there any limit on data output size of process substitution<() >() or variable expansion
<<<()
Which among these is the fastest or is there a way to do it faster?

My ulimit command output is :

bash-3.00$ ulimit -a
core file size        (blocks, -c) unlimited
data seg size         (kbytes, -d) unlimited
file size             (blocks, -f) unlimited
open files                    (-n) 256
pipe size          (512 bytes, -p) 10
stack size            (kbytes, -s) 8480
cpu time             (seconds, -t) unlimited
max user processes            (-u) 8053
virtual memory        (kbytes, -v) unlimited

Best Answer

Bash process substitution in the form of <(cmd) and >(cmd) is implemented with named pipes if the system supports them. The command cmd is run with its input/output connected to a pipe. When you run e.g. cat <(sleep 10; ls) you can find the created pipe under the directory /proc/pid_of_cat/fd. This named pipe is then passed as an argument to the current command (cat).

The buffer capacity of a pipe can be estimated with a tricky usage of dd command which sends zero data to the standard input of sleep command (which does nothing). Apparently, the process will sleep some time so the buffer will get full:

(dd if=/dev/zero bs=1 | sleep 999) &

Give it a second and then send USR1 signal to the dd process:

pkill -USR1 dd

This makes the process to print out I/O statistics:

65537+0 records in
65536+0 records out
65536 bytes (66 kB) copied, 8.62622 s, 7.6 kB/s

In my test case, the buffer size is 64kB (65536B).

How do you use <<<(cmd) expansion? I'm aware of it's a variation of here documents which is expanded and passed to the command on its standard input.

Hopefully, I shed some light on the question about size. Regarding speed, I'm not so sure but I would assume that both methods can deliver similar throughput.

Related Solutions

Bash Reuse Process Substitution File

The <(…) construct creates a pipe. The pipe is passed via a file name like /dev/fd/63, but this is a special kind of file: opening it really means duplicating file descriptor 63. (See the end of this answer for more explanations.)

Reading from a pipe is a destructive operation: once you've caught a byte, you can't throw it back. So your script needs to save the output from the pipe. You can use a temporary file (preferable if the input is large) or a variable (preferable if the input is small). With a temporary file:

tmp=$(mktemp)
cat <"$1" >"$tmp"
cat <"$tmp"
grep hello <"$tmp"
sed 's/hello/world/g' <"$tmp"
rm -f "$tmp"

(You can combine the two calls to cat as tee <"$1" -- "$tmp".) With a variable:

tmp=$(cat)
printf "%s\n"
printf "%s\n" "$tmp" | grep hello
printf "%s\n" "$tmp" | sed 's/hello/world/g'

Note that command substitution $(…) truncates all newlines at the end of the command's output. To avoid that, add an extra character and strip it afterwards.

tmp=$(cat; echo a); tmp=${tmp%a}
printf "%s\n"
printf "%s\n" "$tmp" | grep hello
printf "%s\n" "$tmp" | sed 's/hello/world/g'

By the way, don't forget the double quotes around variable substitutions.

Shell – Subshell and process substitution

Process substitution is a feature that originated in the Korn shell in the 80s (in ksh86). At the time, it was only available on systems that had support for /dev/fd/<n> files.

Later, the feature was added to zsh (from the start: 1990) and bash (in 1993). zsh was using temporary named pipes to implement it, while bash was using /dev/fd/<n> where available and named pipes otherwise. zsh switched to using /dev/fd/<n> where available in 2.6-beta17 in 1996.

Support for process substitution via named pipes on systems without /dev/fd was only added to ksh in ksh93u+ in 2012. The public domain clone of ksh doesn't support it.

To my knowledge, no other Bourne-like shell supports it (rc, es, fish, non-Bourne-like shells support it but with a different syntax). yash has a <(...) construct, but that's for process redirection.

While quite useful, the feature was never standardized by POSIX. So, one can't expect to find it in sh, so shouldn't use it in a sh script.

Though the behaviour for <(...) is unspecified in POSIX, (so there would be no harm in retaining it), bash disables the feature when called as sh or when called with POSIXLY_CORRECT=1 in its environment.

So, if you have a script that uses <(...), you should use a shell that supports the feature to interpret it like zsh, bash or AT&T ksh (of course, you need to make sure the rest of the syntax of script is also compatible with that shell).

In any case:

cat <(cmd)

Can be written:

cmd | cat

Or just

cmd

For a command other than cat (that needs to be passed data via a file given as argument), on systems with /dev/fd/x, you can always do:

something | that-cmd /dev/stdin

Or if you need that-cmd's stdin to be preserved:

{ something 3<&- | that-cmd /dev/fd/4 4<&0 <&3 3<&-; } 3<&0

Best Answer

Related Solutions

Bash Reuse Process Substitution File

Shell – Subshell and process substitution

Related Question