Cross-Product Between Table Column and Input Values in PostgreSQL

join;postgresqlset-returning-functions

I seem to be unable to write an SQL query, which computes the cross-product between a table column and a set of given input values.

Something along the lines of:

WITH {1,2} as Input
Select *
From mTable.column, Input

With mTable.column containing the values 3 and 4, it should return:

1,3
1,4
2,3
2,4

Is there any way to achieve this?

Best Answer

In other RDBMS (like SQL Server before 2008 - as per Paul's comment) one might cross join to a subquery with UNION ALL SELECT, but there are more convenient and efficient options in Postgres.

And you don't need a CTE for this. You can use it, but it has no performance benefit.

Provide a set with VALUES:

VALUES computes a row value or set of row values specified by value expressions. It is most commonly used to generate a "constant table" within a larger command, but it can be used on its own.
```
SELECT t.i, m.col1
FROM   mTable m
CROSS  JOIN (VALUES (1), (2)) t(i);
```

Provide an array and unnest()

2a. with an array constructor:

SELECT i, m.col1
FROM   mTable m
CROSS  JOIN unnest (ARRAY[1,2]) i;

2b. With an array literal:

SELECT i, m.col1
FROM   mTable m
CROSS  JOIN unnest ('{1,2}'::int[]) i;

Add ORDER BY i, m.col1 if you need the sort order in your result.

About row and array syntax:

Array of strings when updating a field

Procedural solution with PL/pgSQL

CREATE OR REPLACE FUNCTION f_next_round()
  RETURNS TABLE (player_id1 int, player_id2 int) AS
$func$
DECLARE
   rows int := (SELECT count(*)/2 FROM tbl);  -- expected number of resulting rows
   ct   int := 0;                             -- running count
BEGIN

CREATE TEMP TABLE t ON COMMIT DROP AS         -- possible combinations
SELECT t1.player_id AS p1, t2.player_id AS p2
     , COALESCE(array_length(t1.opp_log,1), 0) AS len1
     , COALESCE(array_length(t2.opp_log,1), 0) AS len2
FROM   tbl t1, tbl t2 
WHERE  t2.player_id <> t1.player_id
AND    t2.player_id <> ALL (t1.opp_log)
AND    t1.player_id <> ALL (t2.opp_log)
ORDER  BY len1 DESC, len2 DESC;               -- opportune sort order

LOOP
   SELECT INTO player_id1, player_id2  p1, p2 FROM t LIMIT 1;

   EXIT WHEN NOT FOUND;
   RETURN NEXT;
   ct := ct + 1;                              -- running count

   DELETE FROM t                              -- remove obsolete pairs
   WHERE  p1 IN (player_id1, player_id2) OR 
          p2 IN (player_id1, player_id2);
END LOOP;

IF ct < rows THEN
   RAISE EXCEPTION 'Could not find a solution';
ELSIF ct > rows THEN
   RAISE EXCEPTION 'Impossible result!';
END IF;

END
$func$  LANGUAGE plpgsql VOLATILE;

How?

Build a temporary table with remaining possible pairs. This kind of cross join produces a lot of rows with big tables, but since we seem to be talking about tournaments, numbers should be reasonably low.

Players with the longest list of opponents are sorted first. This way, players that would be hard to match come first, increasing the chance for a solution.

Pick the first row and delete related pairings now obsolete. Do need to sort again. Logically any row is good, practically we get the player with the longest list of opponents first due to initial sort (which is not reliable without ORDER BY, but good enough for the case).

Repeat until no match is left.
Keep count and raise an exception if the count is not as expected. PL/pgSQL conveniently allows to raise an exception after the fact, which cancels any previous return values. Details in the manual.

Call:

SELECT * FROM f_next_round();

Result:

player_id1 | player_id2
-----------+-----------
1          | 7
2          | 3
4          | 8
5          | 6

SQL Fiddle.

Note

This does not guarantee to calculate the perfect solution. I just returns a possible solution and uses some limited smarts to improve the chances to find one. The problem is a bit like solving a Sudoku, really and is not trivially solved perfectly.

PostgreSQL Naming Conflict – Function Parameter vs JOIN USING Clause

According to the docs PL/pgSQL Under the Hood, you can use the configuration parameter plpgsql.variable_conflict, either before creating the function or at the start of the function definition, declaring how you want such conflicts to be resolved.
The 3 possible settings are error (the default), use_variable and use_column:

CREATE OR REPLACE FUNCTION f_merge_foobar()
  RETURNS TABLE(ts int, foo text, bar text)
  LANGUAGE plpgsql AS
$func$
#variable_conflict use_column             -- how to resolve conflicts
BEGIN
   FOR ts, foo, bar IN
      SELECT ts, f.foo, b.bar
      FROM   foo f
      FULL   JOIN bar b USING (ts)
   LOOP
      -- do something
      RETURN NEXT;
   END LOOP;
END
$func$;

Best Answer

Related Solutions

PostgreSQL Self-Join – How to Create Unique Pairs

Procedural solution with PL/pgSQL

How?

Note

PostgreSQL Naming Conflict – Function Parameter vs JOIN USING Clause

Related Question