Postgresql – Create a new table (multiple columns) by selecting random values from an existing table

ctepostgresqlrandomtableupdate

I am stuck on the following task (PostgreSQL 9.3). Let's say we have the following table1 (which has 10k rows):

table1:

id
754
800
330
4
59

My goal is to create another table2 with two columns (source, target), where the values of both columns are a random selection of table1.id values. For example:

table2:

source | target
754    | 59
4      | 4
59     | 330

This is what I've done:

CREATE TABLE table2
(
  id serial NOT NULL,
  source integer,
  target integer,
  distance double precision
);

-- Select 300 table1.id values and insertion into table2.source  
INSERT INTO table2(source)
SELECT id FROM table1 ORDER BY RANDOM() LIMIT 300;

-- Select 300 table1.id values and updating table2.target 
UPDATE table2 SET target = i.id
FROM (SELECT id FROM table1 ORDER BY RANDOM() LIMIT 300) i;

I got the following result:

source | target
754    | 59
330    | 59
800    | 59

Unfortunately all the table2.target values are all the same. How can I update table2.target with different random values (like in the example) ? Or maybe UPDATE is not the good way for doing this?

Best Answer

I suggest a "data-modifying CTE":

WITH cte AS (
   SELECT *, row_number() OVER () AS rn
   FROM  (
      SELECT id
      FROM   tbl
      ORDER  BY random()
      LIMIT  600  -- 2 x 300
      ) sub
   )
INSERT INTO table2(source, target)
SELECT c1.id, c2.id
FROM   cte c1
JOIN   cte c2 ON c2.rn = c1.rn + 300;

In the CTE:

  1. select 600 random rows (to create 300 new rows)
  2. add a row number in the outer SELECT.

Then couple two values in a self join with 300 offset.

To get random rows from a huge table cheaply, consider:
Best way to select random rows PostgreSQL