PostgreSQL randomising combinations with LATERAL

postgresqlrandomsubquery

In the following example I have a table foo from which I'd like to pick out at random a row per group.

CREATE TABLE foo (
  line INT
);
INSERT INTO foo (line)
SELECT generate_series(0, 999, 1);

Let's say that I'd like to group by line % 10. I could do this with:

SELECT DISTINCT ON (bin) bin, line
FROM (
    SELECT line, line % 10 AS bin, random() x
    FROM foo
    ORDER BY x
) X

What I'd like to do is get random picks from each bin several times. I had thought I'd be able to do this with generate_series() and LATERAL

SELECT i, line, bin
FROM
(
 SELECT generate_series(1,3) i
) m,
LATERAL
(SELECT DISTINCT ON (bin) bin, line
FROM (
    SELECT line, line % 10 bin, random() x
    FROM foo
    ORDER BY x
) X
ORDER BY bin) Q
ORDER BY bin, i;

However, when I do this in PostgreSQL 9.5 I find I get the same line for a given bin for each iteration i, e.g.,

i;line;bin
1;530;0
2;530;0
3;530;0
1;611;1
2;611;1
3;611;1
...

I'm confused, as I thought the subquery containing the random() would be run differently for each line from the generate_series().

EDIT: I realised that I can achieve the same objective by generating more combinations and choosing from these with

SELECT DISTINCT ON (bin, round) round, bin, line
FROM (
    SELECT line, line % 10 as bin, round
    FROM foo, generate_series(1,3) round
    ORDER BY bin, random()
) X;

So my question is simply why didn't the first way work?

EDIT: The problem appears to be that LATERAL only acts like a for-loop if the subqueries are correlated in some way (thanks to @ypercube's comment). Hence my original approach can be fixed by adding the following small change

SELECT i, line, bin
FROM
(
 SELECT generate_series(1,3) i
) m,
LATERAL
(
SELECT DISTINCT ON (bin) bin, line
FROM (
    SELECT line, line % 10 bin, m.i, random() x -- <NOTE m.i HERE
    FROM foo
    ORDER BY x
) X
ORDER BY bin
LIMIT 3
) Q
ORDER BY bin, i;

Best Answer

I'd write the query like this, using LIMIT (3) instead of DISTINCT ON.

The generate_series(0, 9) is used to get all the distinct bins. You could use (SELECT DISTINCT line % 10 FROM foo) AS g (bin) instead, if the "bins" are not all the integers from 0 up to 9:

SELECT 
    g.bin, 
    ROW_NUMBER() OVER (PARTITION BY g.bin ORDER BY d.x) AS i,
    d.* 
FROM 
    generate_series(0, 9) AS g (bin), 
  LATERAL 
    ( SELECT f.*, random() x 
      FROM foo AS f 
      WHERE  f.line % 10 = g.bin 
      ORDER BY x 
      LIMIT 3
    ) AS d
ORDER BY 
    bin, x ;

Also, if you don't need the random() number in the output, you could use ORDER BY random() in the subquery and remove x from the select and order by clauses - or replace ORDER BY d.x with ORDER BY d.line.