I have a 1 TB file. I would like to read from byte 12345678901 to byte 19876543212 and put that on standard output on a machine with 100 MB RAM.
I can easily write a perl script that does this. sysread delivers 700 MB/s (which is fine), but syswrite only delivers 30 MB/s. I would like something more efficient, preferably something that is installed every Unix system and that can deliver in the order of 1 GB/s.
My first idea is:
dd if=1tb skip=12345678901 bs=1 count=$((19876543212-12345678901))
But that is not efficient.
Edit:
I have no idea how I measured syswrite wrong. This delivers 3.5 GB/s:
perl -e 'sysseek(STDIN,shift,0) || die; $left = shift; \
while($read = sysread(STDIN,$buf, ($left > 32768 ? 32768 : $left))){ \
$left -= $read; syswrite(STDOUT,$buf);
}' 12345678901 $((19876543212-12345678901)) < bigfile
and avoids the yes | dd bs=1024k count=10 | wc
nightmare.
Best Answer
This is slow because of the small block size. Using a recent GNU
dd
(coreutils v8.16 +), the simplest way is to use theskip_bytes
andcount_bytes
options:Update
fullblock
option added above as per @Gilles answer. At first I thought that it might be implied bycount_bytes
, but this is not the case.The issues mentioned are a potential problem below, if
dd
s read/write calls are interrupted for any reason then data will be lost. This is not likely in most cases (odds are reduced somewhat since we are reading from a file and not a pipe).Using a
dd
without theskip_bytes
andcount_bytes
options is more difficult:You could also experiment with different block sizes, but the gains won't be very dramatic. See - Is there a way to determine the optimal value for the bs parameter to dd?