Postgresql – Insert values from a record variable into a subclass table

dynamic-sqlparameterplpgsqlpostgresql

My goal is to create a stored procedure that builds up a _tbl_out which is a subclass of a _tbl_in. I'm following the results of this question, but for this to work, the tables need to have the SAME schema. I want the output_table to have the same schema PLUS some extra computed columns.

Will I have to use a less elegant solution than the one linked? Or is there a solution where I can append columns onto the _tbl_in record before I insert it to the subclass _tbl_out?

Here is the code for the previous solution for reference:

CREATE OR REPLACE FUNCTION gesio(_tbl_in anyelement, _tbl_out regclass)
  RETURNS void AS
$func$
BEGIN

FOR _tbl_in IN EXECUTE
   format('SELECT * FROM %s', pg_typeof(_tbl_in))
LOOP
   -- do something with record

   EXECUTE format('INSERT INTO %s SELECT $1.*', _tbl_out)
   USING _tbl_in;
END LOOP;

END
$func$  LANGUAGE plpgsql;

Call (important!):

SELECT gesio(NULL::t, 't1');

t and t1 being the tables with identical schema.

Best Answer

Your goal is ...

a stored procedure that builds up a _tbl_out which is a subclass of a _tbl_in.

And by "subclass" you mean there are additional columns at the end of _tbl_out.

So the function shall take two table names and copy contents from the first to the second - plus one or more additional (computed?) columns. The only thing we know about table definitions: the second has additional dangling columns.

You only need a polymorphic parameter (ANYELEMENT) to pass in or return values of dynamic type - or at least work with a variable of the type in the function body. What you demonstrate does not return anything (RETURNS void) and also has no need for the input value or type. You only use the table name derived from the input parameter and the parameter itself as auxiliary variable - which goes away after simplifying the function.

Unless you have computations depending on the type or value of the polymorphic parameter, two regclass parameters and a single dynamic statement can do the job, and much faster, too. (Or even just two text parameters passing valid table names, but regclass is more reliable):

CREATE OR REPLACE FUNCTION pg_temp.gesio(_tbl_in regclass, _tbl_out regclass)
  RETURNS void AS
$func$
BEGIN
   EXECUTE format($$
      INSERT INTO %s       -- *no* target column list
      SELECT *
           , 'some value'  -- AS computed_column1  -- column alias only for documentation
          -- more ?
      FROM   %s$$$         -- ORDER BY ???
    , _tbl_out, _tbl_in);
END
$func$  LANGUAGE plpgsql;

I added a note to my old answer you referenced that was not clear enough:

Insert values from a record variable into a table

Performance with big tables

A major factor for the resulting cost is the Write Ahead Log (WAL) that has to be maintained for this form. There are ways to avoid that completely and make it substantially faster: Write to a completely new table within the transaction, so it's invisible to the rest of the world until committed. Details:

What causes large INSERT to slow down and disk usage to explode?

Aside: It's a function, not a stored procedure. Postgres does not have stored procedures, strictly speaking. (The most notable difference: functions always run inside a single transaction scope.)

Major ingredients

Implicit cursor of a FOR loop instead of explicit cursor. That's generally preferable.
Polymorphic types
Object identifier types
Dynamic SQL in plpgsql
VALUES can take a row type directly.

An obstacle to overcome is that variables inside the function cannot be defined as polymorphic type anyelement (yet). This related answer on SO explains the solution. Provides a workaround for older versions, too.

I am handing in a NULL value of type t, which serves three purposes:

Provide table name.
Provide table type.
Serve as loop variable.

The value of the first parameter is discarded. Use NULL.

Consider this related answer on SO with more details. The most interesting part being the last chapter Various complete table types.

SQL Fiddle demo.

If your computations are not too sophisticated, you may be able to replace the loop with a single dynamic SQL statement, which is typically faster.

Postgresql – Using dynamic column names in PostgreSQL

RETURN QUERY EXECUTE was introduced with Postgres 8.4.
Your version is just too old and unsupported by now. Upgrade to a more recent version.

Also, dynamic column names in the result are very hard to come by. It's a principle of SQL that it wants to know the return type - including the names - up front.

Returning anonymous records without a column definition list only works for a single record. Else you have to provide a column definition list in the call. Details under his question on SO:
Return multiple fields as a record in PostgreSQL with PL/pgSQL

There are limited ways around this with with polymorphic types. Advanced stuff:
Refactor a PL/pgSQL function to return the output of various SELECT queries

BTW, your function would look like this in modern PL/pgSQL:

CREATE OR REPLACE FUNCTION get_policy_name (
  _id        int,
  _lang      text,
  _def_value text
) RETURNS TABLE (col text) AS
$func$
BEGIN
   RETURN QUERY EXECUTE
   format('SELECT COALESCE(%I, col, $1)
           FROM my_table WHERE id = $2'
         , 'col_' || _lang)
   USING _def_value, _id;
END
$func$ LANGUAGE plpgsql;

Call:

SELECT col AS col_fr FROM get_policy_name (1, 'fr', 'foo');

Here, I am simply using a column alias in the call to achieve what you want. Much easier than dynamic column names ...

Best Answer

Performance with big tables

Related Solutions

Postgresql – Insert values from a record variable into a table

Major ingredients

Postgresql – Using dynamic column names in PostgreSQL

Related Question