Performance – Why Is /dev/random So Slow When Using dd?

ddhard driveperformancerandom number generatorsecure-erase

I am trying to semi-securely erase a bunch of hard drives. The following is working at 20-50Mb/s

dd if=/dev/zero of=/dev/sda

But

dd if=/dev/random of=/dev/sda 

seems not to work. Also when I type

dd if=/dev/random of=stdout

It only gives me a few bytes regardless of what I pass it for bs= and count=

Am I using /dev/random wrong? What other info should I look for to move this troubleshooting forward? Is there some other way to do this with a script or something like

makeMyLifeEasy | dd if=stdin of=/dev/sda

Or something like that…

Best Answer

Both /dev/random and /dev/urandom use an "entropy pool". When the pool runs out, /dev/random waits for it to refill, which requires monitoring system behavior (keyboard input, mouse movement, etc.), whereas /dev/urandom will continue to give you pseudo-random data. /dev/random is theoretically higher quality, but /dev/urandom is almost certainly good enough for your purposes. (But even /dev/urandom is likely be slower than some other methods. A faster, but lower quality, generator is probably good enough for erasing hard drives. It's not clear that an attacker would gain any advantage from knowing the sequence that's going to be generated, or that random numbers are better for this purpose than a sequence like 0, 1, 2, 3, 4, ....)

Quoting the random(4) man page:

If you are unsure about whether you should use /dev/random or /dev/urandom, then probably you want to use the latter. As a general rule, /dev/urandom should be used for everything except long-lived GPG/SSL/SSH keys.

UPDATE : The `random(4) man page has been updated since I wrote that. It now says:

The /dev/random interface is considered a legacy interface, and /dev/urandom is preferred and sufficient in all use cases, with the exception of applications which require randomness during early boot time; for these applications, getrandom(2) must be used instead, because it will block until the entropy pool is initialized.

See also "Myths about /dev/urandom" by Thomas Hühn.

But /dev/urandom, even though it won't block, is likely to be too slow if you want to generate huge amounts of data. Take some measurements on your system before trying it.

EDIT : The following is a digression on "true" random numbers vs. pseudo-random numbers. If all you're interested in is a practical answer to the question, you can stop reading now.

I've seem claims (including in other answers here) that /dev/random implements a "true" random number generator, as opposed to a pseudo-random number generator (PRNG). For example, the Wikipedia article makes such a claim. I don't believe that's correct. There's some discussion of it here which refers to hardware random number generators, but I see no evidence that /dev/random typically uses such a device, or that typical computers even have such a device. They differ from PRNGs like the C rand() function in that they're not deterministic, since they harvest entropy from sources that are practically unpredictable.

I'd say there are three classes of "random" number generators:

  1. Deterministic PRNGs, like C's rand() function, which use an algorithm to generate repeatable sequences that have (more or less) the statistical properties of a truly random sequence. These can be good enough for games (given a good way of seeding them), and are necessary for applications that require repeatability, but they're not suitable for cryptography.

  2. Generators like /dev/random and /dev/urandom that harvest entropy from some practically unpredictable source like I/O activity (this is why pounding on the keyboard or moving the mouse can cause /dev/random to produce more data). It's not clear (to me) whether these satisfy the definition of a PRNG (I've seen definitions that say a PRNG is deterministic), but neither are they true random number generators.

  3. Hardware random number generators that are physically unpredictable even with complete knowledge of their initial state, and that additionally use mathematical techniques to ensure the right statistical properties.