Postgresql – Is “TABLESAMPLE BERNOULLI(1)” not very random at all

postgresql

Whenever I need to return a random record from my tables and performance matters, instead of:

SELECT column FROM table ORDER BY random() LIMIT 1;

I always do:

SELECT column FROM table TABLESAMPLE BERNOULLI(1) LIMIT 1;

This is much faster but it seems like it's not very random? It looks like a lot of the same records get returned when using this method repeatedly. Is it just me or is this method much less random (and less useful because of that)?

Best Answer

The probability of a row to be returned from TABLESAMPLE BERNOULLI(1) is 1/100, that is, 0.01. The presence of LIMIT 1 tells Postgres to stop reading the table after the first match. This means that on average one of the first 100 rows (in whichever order they are scanned by the engine) in the table will be retrieved.

The probability of any particular row to be returned after ORDER BY random() LIMIT 1 is, on the other hand, 1/C, where C is the table cardinality. The entire table will be read and randomly ordered, then one row returned. This means that if your table has more than 100 rows, the ORDER BY random() will return a row from a wider selection (or, as you put it, will be "more random").

Some background info here.