How to extract a tar.gz file to two destinations concurrently

parallelismtar

I have some tar.gz files. I need to extract these tar.gz files to two USB harddisks. With the command tar -xvzf I can only extract the tar.gz to one harddisk at a time.

Grateful if I can be advised how can I extract the tar.gz file to two harddisks concurrently. Since the total size of the files I need to migrate are about 4.5TB, this can help to save much time for me.

Best Answer

With zsh, using process substitution and its tee-like behaviour when you redirect a file descriptor several times:

zcat file.tar.gz > >(cd /media/disk1 && tar xf -) > >(cd /media/disk2 && tar xf -)

With other shells with support for process substitution (ksh, bash):

{
  zcat file.tar.gz 4>&- |
    tee >({ cd /media/disk1 && tar xf -; } >&4 4>&-) |
    { cd /media/disk2 && tar xf -; } 4>&-
} 4>&1

POSIXly on systems with /dev/fd/x:

zcat file.tar.gz | {
  {
    tee /dev/fd/3 |
      { cd /media/disk1 && tar xf -; } 3>&-
  } 3>&1 >&4 4>&- |
    { cd /media/disk2 && tar xf -; } 4>&-
} 4>&1

To do if for several tar.gz files, that's just (still in zsh)

for f (*.tar.gz) zcat -- $f > >(cd /media/disk1 && tar xf -) > >(cd /media/disk2 && tar xf -)

Note that the files will be extracted at the speed of the slowest destination drive.

Another approach could be to run extractions for both drives in parallel (here using POSIX sh syntax):

for f in *.tar.gz; do zcat -- $f | (cd /media/disk1 && tar xf -); done &
for f in *.tar.gz; do zcat -- $f | (cd /media/disk2 && tar xf -); done
wait

However note that:

  • it means uncompressing each file twice which is using more CPU resource
  • if one destination drive is significantly slower than the other one, some tar.gz file could end up being read twice from the source disk as it could end up being evicted from the cache which would result in more I/O on your machine.
Related Question