Duplicate output of a pipe

pipeprocess-substitutionvariable

Well, the task is simple: a part of my script has to compute both md5 and sha1 hashes. The input is a file – big file – and hashes have to be put into MD and SH variables for later output composition.

While the processed files are realy big (hundreds of GB) I try to use some kind of multiple use of data once read. I found something called process substitution what I adopted in the next way:

$ dd if=big.tgz 2>/dev/null |tee >(sha1sum ) > >(md5sum ) ;

instead of:

$ SH=$(sha1sum big.tgz); MD=$(md5sum big.tgz);

But I found the next:

  • there is apparently no resource neither time saving as both takes aprox. 40s (for 4.776 GB file)

  • I have no idea how to save the result of the subprocess >(md5sum ) into the variable MD to use it later in the script

I tried to understand the pipexec but even the nice color illustrations no success until yet.

Is there some other way to redirect the output to a vriable, other than VAR=$(command) ?

Best Answer

On the subject of performance, you may be limited by CPU. Actually 4.7TB in 40 seconds for both MD5 and sha1sum feels fast. So even if you work this way. For what it's worth you will have reduced disk IO.

You really don't need to dd for this. You can also just write the output of sha1sum and md5sum direct to a file for later use

tee < big.tgz  >(sha1sum > big.tgz.sha1 ) > >(md5sum > big.tgz.md5 )
sha1=`cat big.tgz.sha1`
md5=`cat big.tgz.md5`

I'm suggesting using temp files like this (big.tgz.sha1 and big.tgz.md5) because AFAIK there's no way to simultaneously set two variables with different values. You can capture one straight into a variable but not both. And allowing both md5sum and sha1sum to write to the same stdout at the same time might cause unpredictable problems.

Related Question