How can I create a new file and fill it with 1 Gigabyte worth of random data? I need this to test some software.
I would prefer to use /dev/random
or /dev/urandom
.
random
How can I create a new file and fill it with 1 Gigabyte worth of random data? I need this to test some software.
I would prefer to use /dev/random
or /dev/urandom
.
Linux has two random number generators available to userspace, /dev/random
and /dev/urandom
.
/dev/random
is a source of "true" randomness - i.e. it is not generated by a pseudo-random number generator. Entropy is fed into this by the input driver and the interrupt handler, through the functions add_input_randomness
and add_interrupt_randomness
. Processes reading this device will block if the entropy runs out.
/dev/urandom
is a pseudo-random number generator. It is fed by the same entropy pool as /dev/random
, but when that runs out, it switches to a cryptographically strong generator.
Userspace applications can feed into the entropy pool by writing to /dev/{,u}random
.
Have a read of the random(4) manual page, and the file drivers/char/random.c
in the kernel source tree. It is well commented and most of what you ask is explained there.
FreeBSD's /dev/random
by default is a pseudo-random number generator using the Yarrow algorithm (but can point to a hardware RNG if one is connected). The software generator takes entropy from Ethernet and serial connections and hardware interrupts (changeable through sysctl kern.random
). The Yarrow algorithm is believed to be secure as long as the internal state is unknown, therefore /dev/random
should always output high-quality data without blocking.
See random(4).
On NetBSD, /dev/random
provides random data based only on entropy collected (from disks, network, input devices, and/or tape drives; adjustable using rndctl), while /dev/urandom
falls back to a PRNG when the entropy pool is empty, similar to Linux.
See random(4), rndctl(8), rnd(9).
OpenBSD has four generators: /dev/random
is a hardware generator, /dev/srandom
is a secure random data generator (using MD5 on the entropy pool: "disk and network device interrupts and such"), /dev/urandom
is similar but falls back to a PRNG when the entropy pool is empty. The fourth, /dev/arandom
, is also a PRNG but using RC4.
See random(4), arc4random(3).
Mac OS X also uses the Yarrow algorithm for /dev/random
, but has an identically working /dev/urandom
for compatibility. "Additional entropy is fed to the generator regularly by the SecurityServer daemon from random jitter measurements of the kernel." See random(4).
Summary: dd
is a cranky tool which is hard to use correctly. Don't use it, despite the numerous tutorials that tell you so. dd
has a “unix street cred” vibe attached to it — but if you truly understand what you're doing, you'll know that you shouldn't be touching it with a 10-foot pole.
dd
makes a single call to the read
system call per block (defined by the value of bs
). There is no guarantee that the read
system call returns as much data as the specified buffer size. This tends to work for regular files and block devices, but not for pipes and some character devices. See When is dd suitable for copying data? (or, when are read() and write() partial) for more information. If the read
system call returns less than one full block, then dd
transfers a partial block. It still copies the specified number of blocks, so the total amount of transfered bytes is less than requested.
The warning about a “partial read” tells you exactly this: one of the reads was partial, so dd
transfered an incomplete block. In the block counts, +1
means that one block was read partially; since the output count is +0
, all blocks were written out as read.
This doesn't affect the randomness of the data: all the bytes that dd
writes out are bytes that it read from /dev/urandom
. But you got fewer bytes than expected.
Linux's /dev/urandom
accommodates arbitrary large requests (source: extract_entropy_user
in drivers/char/random.c
), so dd
is normally safe when reading from it. However, reading large amounts of data takes time. If the process receives a signal, the read
system call returns before filling its output buffer. This is normal behavior, and applications are supposed to call read
in a loop; dd
doesn't do this, for historical reasons (dd
's origins are murky, but it seems to have started out as a tool to access tapes, which have peculiar requirements, and was never adapted to be a general-purpose tool). When you check the progress, this sends the dd
process a signal which interrupts the read. You have a choice between knowing how many bytes dd
will copy in total (make sure not to interrupt it — no progress check, no suspension), or knowing how many bytes dd
has copied so far, in which case you can't know how many more bytes it will copy.
The version of dd
in GNU coreutils (as found on non-embedded Linux and on Cygwin) has a flag fullblock
which tells dd
to call read
in a loop (and ditto for write
) and thus always transfer full blocks. The error message suggests that you use it; you should always use it (in both input and output flags), except in very special circumstances (mostly when accessing tapes) — if you use dd
at all, that is: there are usually better solutions (see below).
dd if=/dev/urandom iflag=fullblock oflag=fullblock of=file bs=1M count=1000000
Another possible way to be sure of what dd
will do is to pass a block size of 1. Then you can tell how many bytes were copied from the block count, though I'm not sure what will happen if a read
is interrupted before reading the first byte (which is not very likely in practice but can happen). However, even if it works, this is very slow.
The general advice on using dd
is do not use dd
. Although dd
is often advertised as a low-level command to access devices, it is in fact no such thing: all the magic happens in the device file (the /dev/…
) part, dd
is just an ordinary tool with a high potential for misuse resulting in data loss. In most cases, there is a simpler and safer way to do what you want, at least on Linux.
For example, to read a certain number of bytes at the beginning of a file, just call head
:
head -c 1000000m </dev/urandom >file
I made a quick benchmark on my machine and did not observe any performance difference between dd
with a large block size and head
.
If you need to skip some bytes at the beginning, pipe tail
into head
:
dd if=input of=output count=C bs=B seek=S
<input tail -c +$((S*B+1)) | head -c $((C*B)) >output
If you want to see progress, call lsof
to see the file offset. This only works on a regular file (the output file on your example), not on a character device.
lsof -a -p 1234 -d 1
cat /proc/1234/fdinfo/1
You can call pv
to get a progress report (better than dd
's), at the expense of an additional item in the pipeline (performance-wise, it's barely perceptible).
Best Answer
On most unices:
If your
head
doesn't understand theG
suffix you can specify the size in bytes:If your
head
doesn't understand the-c
option (it's common but not POSIX; you probably have OpenBSD):Do not use
/dev/random
on Linux, use/dev/urandom
.