Duplicate output of a pipe

pipeprocess-substitutionvariable

Well, the task is simple: a part of my script has to compute both md5 and sha1 hashes. The input is a file – big file – and hashes have to be put into MD and SH variables for later output composition.

While the processed files are realy big (hundreds of GB) I try to use some kind of multiple use of data once read. I found something called process substitution what I adopted in the next way:

$ dd if=big.tgz 2>/dev/null |tee >(sha1sum ) > >(md5sum ) ;

instead of:

$ SH=$(sha1sum big.tgz); MD=$(md5sum big.tgz);

But I found the next:

there is apparently no resource neither time saving as both takes aprox. 40s (for 4.776 GB file)
I have no idea how to save the result of the subprocess >(md5sum ) into the variable MD to use it later in the script

I tried to understand the pipexec but even the nice color illustrations no success until yet.

Is there some other way to redirect the output to a vriable, other than VAR=$(command) ?

Best Answer

On the subject of performance, you may be limited by CPU. Actually 4.7TB in 40 seconds for both MD5 and sha1sum feels fast. So even if you work this way. For what it's worth you will have reduced disk IO.

You really don't need to dd for this. You can also just write the output of sha1sum and md5sum direct to a file for later use

tee < big.tgz  >(sha1sum > big.tgz.sha1 ) > >(md5sum > big.tgz.md5 )
sha1=`cat big.tgz.sha1`
md5=`cat big.tgz.md5`

I'm suggesting using temp files like this (big.tgz.sha1 and big.tgz.md5) because AFAIK there's no way to simultaneously set two variables with different values. You can capture one straight into a variable but not both. And allowing both md5sum and sha1sum to write to the same stdout at the same time might cause unpredictable problems.

Related Solutions

Shell – Process Substitution and Pipe

A good way to grok the difference between them is to do a little experimenting on the command line. In spite of the visual similarity in use of the < character, it does something very different than a redirect or pipe.

Let's use the date command for testing.

$ date | cat
Thu Jul 21 12:39:18 EEST 2011

This is a pointless example but it shows that cat accepted the output of date on STDIN and spit it back out. The same results can be achieved by process substitution:

$ cat <(date)
Thu Jul 21 12:40:53 EEST 2011

However what just happened behind the scenes was different. Instead of being given a STDIN stream, cat was actually passed the name of a file that it needed to go open and read. You can see this step by using echo instead of cat.

$ echo <(date)
/proc/self/fd/11

When cat received the file name, it read the file's content for us. On the other hand, echo just showed us the file's name that it was passed. This difference becomes more obvious if you add more substitutions:

$ cat <(date) <(date) <(date)
Thu Jul 21 12:44:45 EEST 2011
Thu Jul 21 12:44:45 EEST 2011
Thu Jul 21 12:44:45 EEST 2011

$ echo <(date) <(date) <(date)
/proc/self/fd/11 /proc/self/fd/12 /proc/self/fd/13

It is possible to combine process substitution (which generates a file) and input redirection (which connects a file to STDIN):

$ cat < <(date)
Thu Jul 21 12:46:22 EEST 2011

It looks pretty much the same but this time cat was passed STDIN stream instead of a file name. You can see this by trying it with echo:

$ echo < <(date)
<blank>

Since echo doesn't read STDIN and no argument was passed, we get nothing.

Pipes and input redirects shove content onto the STDIN stream. Process substitution runs the commands, saves their output to a special temporary file and then passes that file name in place of the command. Whatever command you are using treats it as a file name. Note that the file created is not a regular file but a named pipe that gets removed automatically once it is no longer needed.

Shell – Redirect and pipe output

Yes, this is job for tee:

rpm -qa | tee file | wc -l

In this construction a | b a's stdout goes to stdin of b. In case of a > file | b all output form a goes to file and nothing goes to b stdin. tee command make a copy of all it receives on stdin to both file and stdout.

Best Answer

Related Solutions

Shell – Process Substitution and Pipe

Shell – Redirect and pipe output

Related Question