There are 5 tunables in the /proc file system to change linux' writeback behavior:
dirty_async_ratio
dirty_background_ratio
dirty_sync_ratio
dirty_expire_centisecs
dirty_writeback_centisecs
The configuration is quite complicated and documentation can be found at kernel.org. However, as jordanm already said, "Any userspace application can tell the kernel to write its dirty buffers to disk via the sync() system call." which means that any other process might render your configuration useless.
Also keep your Filesystem settings in mind: Mount options like noatime, data=writeback and nobarrier can dramatically improve your throughput but will also put your data at risk, if your disk controllers are not battery backed.
Summary: dd
is a cranky tool which is hard to use correctly. Don't use it, despite the numerous tutorials that tell you so. dd
has a “unix street cred” vibe attached to it — but if you truly understand what you're doing, you'll know that you shouldn't be touching it with a 10-foot pole.
dd
makes a single call to the read
system call per block (defined by the value of bs
). There is no guarantee that the read
system call returns as much data as the specified buffer size. This tends to work for regular files and block devices, but not for pipes and some character devices. See When is dd suitable for copying data? (or, when are read() and write() partial) for more information. If the read
system call returns less than one full block, then dd
transfers a partial block. It still copies the specified number of blocks, so the total amount of transfered bytes is less than requested.
The warning about a “partial read” tells you exactly this: one of the reads was partial, so dd
transfered an incomplete block. In the block counts, +1
means that one block was read partially; since the output count is +0
, all blocks were written out as read.
This doesn't affect the randomness of the data: all the bytes that dd
writes out are bytes that it read from /dev/urandom
. But you got fewer bytes than expected.
Linux's /dev/urandom
accommodates arbitrary large requests (source: extract_entropy_user
in drivers/char/random.c
), so dd
is normally safe when reading from it. However, reading large amounts of data takes time. If the process receives a signal, the read
system call returns before filling its output buffer. This is normal behavior, and applications are supposed to call read
in a loop; dd
doesn't do this, for historical reasons (dd
's origins are murky, but it seems to have started out as a tool to access tapes, which have peculiar requirements, and was never adapted to be a general-purpose tool). When you check the progress, this sends the dd
process a signal which interrupts the read. You have a choice between knowing how many bytes dd
will copy in total (make sure not to interrupt it — no progress check, no suspension), or knowing how many bytes dd
has copied so far, in which case you can't know how many more bytes it will copy.
The version of dd
in GNU coreutils (as found on non-embedded Linux and on Cygwin) has a flag fullblock
which tells dd
to call read
in a loop (and ditto for write
) and thus always transfer full blocks. The error message suggests that you use it; you should always use it (in both input and output flags), except in very special circumstances (mostly when accessing tapes) — if you use dd
at all, that is: there are usually better solutions (see below).
dd if=/dev/urandom iflag=fullblock oflag=fullblock of=file bs=1M count=1000000
Another possible way to be sure of what dd
will do is to pass a block size of 1. Then you can tell how many bytes were copied from the block count, though I'm not sure what will happen if a read
is interrupted before reading the first byte (which is not very likely in practice but can happen). However, even if it works, this is very slow.
The general advice on using dd
is do not use dd
. Although dd
is often advertised as a low-level command to access devices, it is in fact no such thing: all the magic happens in the device file (the /dev/…
) part, dd
is just an ordinary tool with a high potential for misuse resulting in data loss. In most cases, there is a simpler and safer way to do what you want, at least on Linux.
For example, to read a certain number of bytes at the beginning of a file, just call head
:
head -c 1000000m </dev/urandom >file
I made a quick benchmark on my machine and did not observe any performance difference between dd
with a large block size and head
.
If you need to skip some bytes at the beginning, pipe tail
into head
:
dd if=input of=output count=C bs=B seek=S
<input tail -c +$((S*B+1)) | head -c $((C*B)) >output
If you want to see progress, call lsof
to see the file offset. This only works on a regular file (the output file on your example), not on a character device.
lsof -a -p 1234 -d 1
cat /proc/1234/fdinfo/1
You can call pv
to get a progress report (better than dd
's), at the expense of an additional item in the pipeline (performance-wise, it's barely perceptible).
Best Answer
From the spec:
bs=
expr
operand is specified and no conversions other thansync
,noerror
, ornotrunc
are requested, the data returned from each input block shall be written as a separate output block; if theread()
returns less than a full block and thesync
conversion is not specified, the resulting output block shall be the same size as the input block.So this is probably what causes your confusion. Yes, because
dd
is designed for blocking, by default partialread()
s will be mapped 1:1 to partialwrite()
s, or elsesync
d out on tail padding NUL or space chars tobs=
size whenconv=sync
is specified.This means that
dd
is safe to use for copying data (w/ no risk of corruption due to a partial read or write) in every case but one in which it is arbitrarily limited by acount=
argument, because otherwisedd
will happilywrite()
its output in identically sized blocks to those in which its input wasread()
until itread()
s completely through it. And even this caveat is only true whenbs=
is specified orobs=
is not specified, as the very next sentence in the spec states:bs=
expr
operand is not specified, or a conversion other thansync
,noerror
, ornotrunc
is requested, the input shall be processed and collected into full-sized output blocks until the end of the input is reached.Without
ibs=
and/orobs=
arguments this can't matter - becauseibs
andobs
are both the same size by default. However, you can get explicit about input buffering by specifying different sizes for either and not specifyingbs=
(because it takes precedence).For example, if you do:
...then a POSIX
dd
willwrite()
in chunks of 512 bytes by collecting every singlyread()
byte into a single output block.Otherwise, if you do...
...a POSIX
dd
willread()
at maximum 512 bytes at a time, butwrite()
every megabyte-sized output block (kernel allowing and excepting possibly the last - because that's EOF) in full by collecting input into full-sized output blocks.Also from the spec, though:
count=n
count=
maps toi?bs=
blocks, and so in order to handle an arbitrary limit oncount=
portably you'll need twodd
s. The most practical way to do it with twodd
s is by piping the output of one into the input of another, which surely puts us in the realm of reading/writing a special file regardless of the original input type.An IPC pipe means that when specifying
[io]bs=
args that, to do so safely, you must keep such values within the system's definedPIPE_BUF
limit. POSIX states that the system kernel must only guarantee atomicread()
s andwrite()
s within the limits ofPIPE_BUF
as defined inlimits.h
. POSIX guarantees thatPIPE_BUF
be at least ...{_POSIX_PIPE_BUF}
...(which also happens to be the default
dd
i/o blocksize), but the actual value is usually at least 4k. On an up-to-date linux system it is, by default, 64k.So when you setup your
dd
processes you should do it on a block factor based on three values:PIPE_BUF
or lesser )Like:
You have to synchronize i/o w/
dd
to handle non-seekable inputs. In other words, make pipe-buffers explicit and they cease to be a problem. That's whatdd
is for. The unknown quantity here isyes
's buffer size - but if you block that out to a known quantity with anotherdd
then a little informed multiplication can makedd
safe to use for copying data (w/ no risk of corruption due to a partial read or write) even when arbitrarily limiting input w/count=
w/ any arbitrary input type on any POSIX system and without missing a single byte.Here's a snippet from the POSIX spec:
ibs=
expr
expr
(default is 512).obs=
expr
expr
(default is 512).bs=
expr
expr
bytes, supersedingibs=
andobs=
. If no conversion other thansync
,noerror
, andnotrunc
is specified, each input block shall be copied to the output as a single block without aggregating short blocks.You'll also find some of this explained better here.