Linux – How to parallelize dd

ddlinuxparallelism

I'm currently having trouble with dd invoked with a sparse file as input (if) and a file as output (of) with conv=sparse. dd seems to be using one core of the CPU (Intel(R) Core(TM) i7-3632QM CPU @ 2.20GHz 4 cores + 4 Intel Hyperthreads) only (100 % of 1 core), so I've been wondering whether it's possible to parallelize dd. I've been

  • looking into info dd and man dd and there seems to built-in function in the version of corutils 8.23
  • checking sgp_dd from sg3-utils package (without understanding whether it suits my needs), but it doesn't seem to be able to handle sparse files
  • dcfldd doesn't seems to have parallelization capabilities

AFAIK

  • an enhanced version/fork with internal handling of program parts in multiple threads (avoid context changes killing I/O performance) is preferred over
  • a solution with GNU parallel running locally is preferred over
  • a custom (possibly untested) code sniplet

How to avoid CPU being the bottleneck of an I/O intensive operation? I'd like to run the command on Ubuntu 14.04 with Linux 3.13 and handle sparse file disk images with it on any filesystem supporting sparse file (at least the solution shouldn't be bound to one specific file system).

Background: I'm trying to create a copy of 11TB sparse file (containing about 2TB data) on a zfs (zfsonlinux 0.6.4 unstable version, possibly buggy and the cause for the CPU bottleneck (eventually slow hole search)). That shouldn't change anything for the question of how to parallelize dd (in a very generic way).

Best Answer

Tested in Bash:

INFILE=in
seq 0 1000 $((`stat --format %s $INFILE` /100000 )) |
  parallel -k dd if=$INFILE bs=100000 skip={} conv=sparse seek={} count=1000 of=out

You probably need to adjust 1000.

Related Question