Unevenly-Spread Results Using $RANDOM – Why and How to Fix

random

I read about RNGs on Wikipedia and $RANDOM function on TLDP but it doesn't really explain this result:

$ max=$((6*3600))
$ for f in {1..100000}; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
  21787 0
  22114 1
  21933 2
  12157 3
  10938 4
  11071 5

Why are the values above about 2x more inclined to be 0, 1, 2 than 3, 4, 5 but when I change the max modulo they're almost equally spread over all 10 values?

$ max=$((9*3600))
$ for f in {1..100000}; do echo $(($RANDOM%max/3600)); done | sort | uniq -c
  11940 0
  11199 1
  10898 2
  10945 3
  11239 4
  10928 5
  10875 6
  10759 7
  11217 8

Best Answer

To expand on the topic of modulo bias, your formula is:

max=$((6*3600))
$(($RANDOM%max/3600))

And in this formula, $RANDOM is a random value in the range 0-32767.

   RANDOM Each time this parameter is referenced, a random integer between
          0 and 32767 is generated.

It helps to visualize how this maps to possible values:

0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
0 = 21600-25199
1 = 25200-28799
2 = 28800-32399
3 = 32400-32767

So in your formula, the probability for 0, 1, 2 is twice that of 4, 5. And probability of 3 is slightly higher than 4, 5 too. Hence your result with 0, 1, 2 as winners and 4, 5 as losers.

When changing to 9*3600, it turns out as:

0 = 0-3599
1 = 3600-7199
2 = 7200-10799
3 = 10800-14399
4 = 14400-17999
5 = 18000-21599
6 = 21600-25199
7 = 25200-28799
8 = 28800-32399
0 = 32400-32767

1-8 have the same probability, but there is still a slight bias for 0, and hence 0 was still the winner in your test with 100'000 iterations.

To fix the modulo bias, you should first simplify the formula (if you only want 0-5 then the modulo is 6, not 3600 or even crazier number, no sense in that). This simplification alone will reduce your bias by a lot (32766 maps to 0, 32767 to 1 giving a tiny bias to those two numbers).

To get rid of bias altogether, you need to re-roll, (for example) when $RANDOM is lower than 32768 % 6 (eliminate the states that do not map perfectly to available random range).

max=6
for f in {1..100000}
do
    r=$RANDOM
    while [ $r -lt $((32768 % $max)) ]; do r=$RANDOM; done
    echo $(($r%max))
done | sort | uniq -c | sort -n

Test result:

The alternative would be using a different random source that does not have noticable bias (orders of magnitude larger than just 32768 possible values). But implementing a re-roll logic anyway doesn't hurt (even if it likely never comes to pass).

Related Solutions

Linux – /usr/bin/random using a lot of CPU

Crazy troubleshooting idea: make a honeypot / poor-man's process accounting.

Make a backup of /usr/bin/random

cp -p /usr/bin/random /usr/bin/random.bak

touch /tmp/who_is_calling_random.log ; chmod 622 /tmp/who_is_calling_random.log
Replace /usr/bin/random with this shell script (note you can use a different path than /tmp if you need to, but make sure it's world writable).
```
#!/bin/sh
echo "`date` $USER $$ $@" >> /tmp/who_is_calling_random.log
/usr/bin/random.bak "$@"
```
chmod 755 /usr/bin/random
Reboot the system.
See what gathers in the honeypot log. This should be a log of who/what is behind the use of the random program.
```
tail -f /tmp/who_is_calling_random.log
```
Restore random from the backup you made in step #1.
Reboot system.

Linux – Why writing to /dev/random does not make parallel reading from /dev/random faster

You can write to /dev/random because it is part of the way to provide extra random bytes to /dev/random, but it is not sufficient, you also have to notify the system that there is additional entropy via an ioctl() call.

I needed the same functionality for testing my smartcard setup program, as I did not want to wait for my mouse/keyboard to generate enough for the several calls to gpg that were made for each test run. What I did is to run the Python program, which follows, in parallel to my tests. It of course should not be used at all for real gpg key generation, as the random string is not random at all (system generated random info will still be interleaved). If you have an external source to set the string for random, then you should be able to have high entropy. You can check the entropy with:

cat /proc/sys/kernel/random/entropy_avail

The program:

#!/usr/bin/env python
# For testing purposes only 
# DO NOT USE THIS, THIS DOES NOT PROVIDE ENTROPY TO /dev/random, JUST BYTES

import fcntl
import time
import struct

RNDADDENTROPY=0x40085203

while True:
    random = "3420348024823049823-984230942049832423l4j2l42j"
    t = struct.pack("ii32s", 8, 32, random)
    with open("/dev/random", mode='wb') as fp:
        # as fp has a method fileno(), you can pass it to ioctl
        res = fcntl.ioctl(fp, RNDADDENTROPY, t)
    time.sleep(0.001)

(Don't forget to kill the program after you are done.)

Best Answer

Related Solutions

Linux – /usr/bin/random using a lot of CPU

Linux – Why writing to /dev/random does not make parallel reading from /dev/random faster

Related Question