The first 2 lines in dd
stats have the following format:
a+b records in
c+d records out
Why 2 numeric values? What does this plus sign mean?
It's usually a+0
, but sometimes when I use bigger block size, dd prints 0+b records out
dd
The first 2 lines in dd
stats have the following format:
a+b records in
c+d records out
Why 2 numeric values? What does this plus sign mean?
It's usually a+0
, but sometimes when I use bigger block size, dd prints 0+b records out
From the spec:
bs=
expr
operand is specified and no conversions other than sync
, noerror
, or notrunc
are requested, the data returned from each input block shall be written as a separate output block; if the read()
returns less than a full block and the sync
conversion is not specified, the resulting output block shall be the same size as the input block.So this is probably what causes your confusion. Yes, because dd
is designed for blocking, by default partial read()
s will be mapped 1:1 to partial write()
s, or else sync
d out on tail padding NUL or space chars to bs=
size when conv=sync
is specified.
This means that dd
is safe to use for copying data (w/ no risk of corruption due to a partial read or write) in every case but one in which it is arbitrarily limited by a count=
argument, because otherwise dd
will happily write()
its output in identically sized blocks to those in which its input was read()
until it read()
s completely through it. And even this caveat is only true when bs=
is specified or obs=
is not specified, as the very next sentence in the spec states:
bs=
expr
operand is not specified, or a conversion other than sync
, noerror
, or notrunc
is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached.Without ibs=
and/or obs=
arguments this can't matter - because ibs
and obs
are both the same size by default. However, you can get explicit about input buffering by specifying different sizes for either and not specifying bs=
(because it takes precedence).
For example, if you do:
IN| dd ibs=1| OUT
...then a POSIX dd
will write()
in chunks of 512 bytes by collecting every singly read()
byte into a single output block.
Otherwise, if you do...
IN| dd obs=1kx1k| OUT
...a POSIX dd
will read()
at maximum 512 bytes at a time, but write()
every megabyte-sized output block (kernel allowing and excepting possibly the last - because that's EOF) in full by collecting input into full-sized output blocks.
Also from the spec, though:
count=n
count=
maps to i?bs=
blocks, and so in order to handle an arbitrary limit on count=
portably you'll need two dd
s. The most practical way to do it with two dd
s is by piping the output of one into the input of another, which surely puts us in the realm of reading/writing a special file regardless of the original input type.
An IPC pipe means that when specifying [io]bs=
args that, to do so safely, you must keep such values within the system's defined PIPE_BUF
limit. POSIX states that the system kernel must only guarantee atomic read()
s and write()
s within the limits of PIPE_BUF
as defined in limits.h
. POSIX guarantees that PIPE_BUF
be at least ...
{_POSIX_PIPE_BUF}
...(which also happens to be the default dd
i/o blocksize), but the actual value is usually at least 4k. On an up-to-date linux system it is, by default, 64k.
So when you setup your dd
processes you should do it on a block factor based on three values:
PIPE_BUF
or lesser )Like:
yes | dd obs=1k | dd bs=1k count=10k of=/dev/null
10240+0 records in
10240+0 records out
10485760 bytes (10 MB) copied, 0.1143 s, 91.7 MB/s
You have to synchronize i/o w/ dd
to handle non-seekable inputs. In other words, make pipe-buffers explicit and they cease to be a problem. That's what dd
is for. The unknown quantity here is yes
's buffer size - but if you block that out to a known quantity with another dd
then a little informed multiplication can make dd
safe to use for copying data (w/ no risk of corruption due to a partial read or write) even when arbitrarily limiting input w/ count=
w/ any arbitrary input type on any POSIX system and without missing a single byte.
Here's a snippet from the POSIX spec:
ibs=
expr
expr
(default is 512).obs=
expr
expr
(default is 512).bs=
expr
expr
bytes, superseding ibs=
and obs=
. If no conversion other than sync
, noerror
, and notrunc
is specified, each input block shall be copied to the output as a single block without aggregating short blocks.You'll also find some of this explained better here.
As others have mentioned here using just dd
won't work due to the copy of the GPT table placed at the end of the disk.
I have managed to perform a migration to a smaller drive using the following method:
First - boot into liveCD distro of your choice.
Resize the source drive partitions to indeed fit within the smaller drive's constraints (using gparted
for example).
Then, assuming sda
is the source disk, using sgdisk
, first create a backup of GPT table from the source drive to be on the safe side: `
sgdisk -b=gpt.bak.bin /dev/sda
Assuming sdb
is the target, replicate the table from the source drive to the target:
sgdisk -R=/dev/sdb /dev/sda
sgdisk
will now complain that it tried placing the header copy out of the bounds of the target disk, but then will fallback and place the header correctly at the upper bound of the target disk.
Verify that a correct clone of the partition table has been created on the target drive using the tool of your choice (gparted
for example).
Using dd
, copy each partition from the source drive to the target:
dd if=/dev/sda1 of=/dev/sdb1 bs=1M
dd if=/dev/sda2 of=/dev/sdb2 bs=1M
dd if=/dev/sda3 of=/dev/sdb3 bs=1M
etc...
Obviously, if you mix up the names of the drives when replicating the GPT partition table without a backup or when dd
ing the content you can kiss your content goodbye :)
Best Answer
It means full blocks of that
bs
size plus extra blocks with size smaller than the bs.Edit: frostschutz's answer mentions another case to generate non-full blocks. Worth reading. See also https://unix.stackexchange.com/a/17357/73443.