I have been wondering what would be the best way to get good randomness in bash, i.e., what would be a procedure to get a random positive integer between MIN
and MAX
such that
- The range can be arbitrarily large (or at least, say, up to 232-1);
- Values are uniformly distributed (i.e., no bias);
- It is efficient.
An efficient way to get randomness in bash is to use the $RANDOM
variable. However, this only samples a value between 0 and 215-1, which may not be large enough for all purposes. People typically use a modulo to get it into the range they want, e.g.,
MIN=0
MAX=12345
rnd=$(( $RANDOM % ($MAX + 1 - $MIN) + $MIN ))
This, additionally, creates a bias unless $MAX
happens to divide 215-1=32767. E.g., if $MIN
is 0 and $MAX
is 9, then the values 0 through 7 are slightly more probable than the values 8 and 9, as $RANDOM
will never be 32768 or 32769. This bias gets worse as the range increases, e.g., if $MIN
is 0 and $MAX
is 9999, then the numbers 0 through 2767 have a probability of 4/32767, while the numbers 2768 through 9999 only have a probability of 3/32767.
So while the above method fulfills condition 3, it does not fulfill conditions 1 and 2.
The best method that I came up with so far in trying to satisfy conditions 1 and 2 was to use /dev/urandom
as follows:
MIN=0
MAX=1234567890
while
rnd=$(cat /dev/urandom | tr -dc 0-9 | fold -w${#MAX} | head -1 | sed 's/^0*//;')
[ -z $rnd ] && rnd=0
(( $rnd < $MIN || $rnd > $MAX ))
do :
done
Basically, just collect randomness from /dev/urandom
(might consider to use /dev/random
instead if a cryptographically strong pseudorandom number generator is desired, and if you have lots of time, or else maybe a hardware random number generator), delete every character that's not a decimal digit, fold the output to the length of $MAX
and cut leading 0's. If we happened to only get 0's then $rnd
is empty, so in this case set rnd
to 0
. Check if the result is outside our range and if so, then repeat. I forced the "body" of the while loop into the guard here so as to force execution of the body at least once, in the spirit of emulating a do ... while
loop, since rnd
is undefined to start with.
I think I fulfilled conditions 1 and 2 here, but now I screwed up condition 3. It's kinda slow. Takes up to a second or so (tenth of a second when I'm lucky). Actually, the loop is not even guaranteed to terminate (although the probability of termination converges to 1 as time increases).
Is there an efficient way to get unbiased random integers, within a pre-specified and potentially large range, in bash? (I'll continue to investigate when time allows, but in the meantime I thought someone here might have a cool idea!)
Table of Answers
-
The most basic (and hence portable) idea is to generate a random bitstring just long enough. There are different ways of generating a random bitstring, either using bash's built-in
$RANDOM
variable or usingod
and/dev/urandom
(or/dev/random
). If the random number is greater than$MAX
, start over. -
Alternatively, it is possible to use external tools.
- The Perl solution
- Pro: quite portable, simple, flexible
- Contra: not for very large numbers above 232-1
- The Python solution
- Pro: simple, flexible, works even for large numbers
- Contra: less portable
- The zsh solution
- Pro: good for people who use zsh anyway
- Contra: probably even less portable
- The Perl solution
Best Answer
I see another interesting method from here.
This one also seems to be a good option. It reads 4 bytes from the random device and formats them as unsigned integer between
0
and2^32-1
.