Postgresql – How to generate a pivoted CROSS JOIN where the resulting table definition is unknown

dynamic-sqlpivotpostgresql

Given two tables with an undefined row count with a name and value, how would I display a pivoted CROSS JOIN of a function over their values.

CREATE TEMP TABLE foo AS
SELECT x::text AS name, x::int
FROM generate_series(1,10) AS t(x);

CREATE TEMP TABLE bar AS
SELECT x::text AS name, x::int
FROM generate_series(1,5) AS t(x);

For example, if that function were multiplication, how would I generate a (multiplication) table like the one below,

Common multiplication table of 1..12

All of those (arg1,arg2,result) rows can be generated with

SELECT foo.name AS arg1, bar.name AS arg2, foo.x*bar.x AS result
FROM foo
CROSS JOIN bar; 

So this is only a question of presentation, I would like this to also work with a custom name — a name that is not simply the argument CASTed to text but set in the table,

CREATE TEMP TABLE foo AS
SELECT chr(x+64) AS name, x::int
FROM generate_series(1,10) AS t(x);

CREATE TEMP TABLE bar AS
SELECT chr(x+72) AS name, x::int
FROM generate_series(1,5) AS t(x);

I think this would be easily do-able with a CROSSTAB capable of a dynamic return-type.

SELECT * FROM crosstab(
  '
    SELECT foo.x AS arg1, bar.x AS arg2, foo.x*bar.x
    FROM foo
    CROSS JOIN bar
  ', 'SELECT DISTINCT name FROM bar'
) AS **MAGIC**

But, without the **MAGIC**, I get

ERROR:  a column definition list is required for functions returning "record"
LINE 1: SELECT * FROM crosstab(

For reference, using the above examples with names this is something more like what tablefunc's crosstab() wants.

SELECT * FROM crosstab(
  '
    SELECT foo.x AS arg1, bar.x AS arg2, foo.x*bar.x
    FROM foo
    CROSS JOIN bar
  '
) AS t(row int, i int, j int, k int, l int, m int);

But, now we're back to making assumptions about the content and size of the bar table in our example. So if,

  1. The tables are of undefined length,
  2. Then the cross-join represents a cube of undefined dimension (because of above),
  3. The catagory-names (cross-tab parlance) are in the table

What's the best we can do in PostgreSQL without a "column definition list" to generate that kind of presentation?

Best Answer

Simple case, static SQL

The non-dynamic solution with crosstab() for the simple case:

SELECT * FROM crosstab(
  'SELECT b.x, f.name, f.x * b.x AS prod
   FROM   foo f, bar b
   ORDER  BY 1, 2'
   ) AS ct (x int, "A" int, "B" int, "C" int, "D" int, "E" int
                 , "F" int, "G" int, "H" int, "I" int, "J" int);

I order resulting columns by foo.name, not foo.x. Both happen to be sorted in parallel, but that's just the simple setup. Pick the right sort order for your case. The actual value of the second column is irrelevant in this query (1-parameter form of crosstab()).

We don't even need crosstab() with 2 parameters because there are no missing values by definition. See:

(You fixed the crosstab query in the question by replacing foo with bar in a later edit. This also fixes the query, but keeps working with names from foo.)

Unknown return type, dynamic SQL

Column names and types cannot be dynamic. SQL demands to know number, names and types of resulting columns at call time. Either by explicit declaration or from information in the system catalogs (That's what happens with SELECT * FROM tbl: Postgres looks up the registered table definition.)

You want Postgres to derive resulting columns from data in a user table. Not going to happen.

One way or the other, you need two round trips to the server. Either you create a cursor and then walk through it. Or you create a temp table and then select from it. Or you register a type and use it in the call.

Or you simply generate the query in one step and execute it in the next:

SELECT $$SELECT * FROM crosstab(
  'SELECT b.x, f.name, f.x * b.x AS prod
   FROM   foo f, bar b
   ORDER  BY 1, 2'
   ) AS ct (x int, $$
 || string_agg(quote_ident(name), ' int, ' ORDER BY name) || ' int)'
FROM   foo;

This generates the query above, dynamically. Execute it in the next step.

I am using dollar-quotes ($$) to keep handling of nested quotes simple. See:

quote_ident() is essential to escape otherwise illegal column names (and possibly defend against SQL injection).

Related: