Bash – How to Efficiently Generate Large, Uniformly Distributed, Random Integers in Bash

bashcommand lineshell-script

I have been wondering what would be the best way to get good randomness in bash, i.e., what would be a procedure to get a random positive integer between MIN and MAX such that

  1. The range can be arbitrarily large (or at least, say, up to 232-1);
  2. Values are uniformly distributed (i.e., no bias);
  3. It is efficient.

An efficient way to get randomness in bash is to use the $RANDOM variable. However, this only samples a value between 0 and 215-1, which may not be large enough for all purposes. People typically use a modulo to get it into the range they want, e.g.,

MIN=0
MAX=12345
rnd=$(( $RANDOM % ($MAX + 1 - $MIN) + $MIN ))

This, additionally, creates a bias unless $MAX happens to divide 215-1=32767. E.g., if $MIN is 0 and $MAX is 9, then the values 0 through 7 are slightly more probable than the values 8 and 9, as $RANDOM will never be 32768 or 32769. This bias gets worse as the range increases, e.g., if $MIN is 0 and $MAX is 9999, then the numbers 0 through 2767 have a probability of 4/32767, while the numbers 2768 through 9999 only have a probability of 3/32767.

So while the above method fulfills condition 3, it does not fulfill conditions 1 and 2.

The best method that I came up with so far in trying to satisfy conditions 1 and 2 was to use /dev/urandom as follows:

MIN=0
MAX=1234567890
while
  rnd=$(cat /dev/urandom | tr -dc 0-9 | fold -w${#MAX} | head -1 | sed 's/^0*//;')
  [ -z $rnd ] && rnd=0
  (( $rnd < $MIN || $rnd > $MAX ))
do :
done

Basically, just collect randomness from /dev/urandom (might consider to use /dev/random instead if a cryptographically strong pseudorandom number generator is desired, and if you have lots of time, or else maybe a hardware random number generator), delete every character that's not a decimal digit, fold the output to the length of $MAX and cut leading 0's. If we happened to only get 0's then $rnd is empty, so in this case set rnd to 0. Check if the result is outside our range and if so, then repeat. I forced the "body" of the while loop into the guard here so as to force execution of the body at least once, in the spirit of emulating a do ... while loop, since rnd is undefined to start with.

I think I fulfilled conditions 1 and 2 here, but now I screwed up condition 3. It's kinda slow. Takes up to a second or so (tenth of a second when I'm lucky). Actually, the loop is not even guaranteed to terminate (although the probability of termination converges to 1 as time increases).

Is there an efficient way to get unbiased random integers, within a pre-specified and potentially large range, in bash? (I'll continue to investigate when time allows, but in the meantime I thought someone here might have a cool idea!)

Table of Answers

  1. The most basic (and hence portable) idea is to generate a random bitstring just long enough. There are different ways of generating a random bitstring, either using bash's built-in $RANDOM variable or using od and /dev/urandom (or /dev/random). If the random number is greater than $MAX, start over.

  2. Alternatively, it is possible to use external tools.

    • The Perl solution
      • Pro: quite portable, simple, flexible
      • Contra: not for very large numbers above 232-1
    • The Python solution
      • Pro: simple, flexible, works even for large numbers
      • Contra: less portable
    • The zsh solution
      • Pro: good for people who use zsh anyway
      • Contra: probably even less portable

Best Answer

I see another interesting method from here.

rand=$(openssl rand 4 | od -DAn)

This one also seems to be a good option. It reads 4 bytes from the random device and formats them as unsigned integer between 0 and 2^32-1.

rand=$(od -N 4 -t uL -An /dev/urandom | tr -d " ")
Related Question