I read about RNGs on Wikipedia and $RANDOM
function on TLDP but it doesn't really explain this result:
$ max=$((6*3600))
$ for f in {1..100000}; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
21787 0
22114 1
21933 2
12157 3
10938 4
11071 5
Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?
$ max=$((9*3600))
$ for f in {1..100000}; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
11940 0
11199 1
10898 2
10945 3
11239 4
10928 5
10875 6
10759 7
11217 8
Best Answer
To expand on the topic of modulo bias, your formula is:
And in this formula,
$RANDOM
is a random value in the range 0-32767.It helps to visualize how this maps to possible values:
So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.
When changing to
9*3600
, it turns out as:1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.
To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).
To get rid of bias altogether, you need to re-roll, (for example) when
$RANDOM
is lower than32768 % 6
(eliminate the states that do not map perfectly to available random range).Test result:
The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).