Postgresql – Concatenation of setof type or setof record

plpgsqlpostgresqlpostgresql-9.1set-returning-functionsstored-procedures

I use Postgresql 9.1 with Ubuntu 12.04.

In a plpgsql function I try to concatenate setof type returned from another function.

the type pair_id_value in question is created with create type pair_id_value as (id bigint, value integer);

the function that returns elementary setof pair_id_value (those that will be concatenated later) is this one:

create or replace function compute_pair_id_value(id bigint, value integer)
    returns setof pair_id_value
as $$
    listResults = []
    for x in range(0,value+1):
        listResults.append({ "id": id, "value": x})
    return listResults
$$
language plpython3u;

this straigth-forward plpython code should be good, for exemple the query: select * from compute_pair_id_value(1712437,2); returns nicely:

  id            | value 
 ---------------+-----------
        1712437 |         0
        1712437 |         1
        1712437 |         2
 (3 rows)

this python function is fairly simple for now, for this example, but above all for my proof of concept. It will be more complex in the near future.

The problem arises when I try to concatenate all the result tables from multiples id.

create or replace function compute_all_pair_id_value(id_obj bigint)
    returns setof pair_id_value as $$
declare
    pair pair_id_value;
begin
    for pair in (select compute_pair_id_value(t.id, t.obj_value) from my_obj as t where t.id = id_obj)
    loop
            return next pair;
    end loop;
    return; 
end; $$ language plpgsql;

I receive the error: invalid input syntax for integer "(1712437,0)" as if it is no longer seen as a pair_id_value with two columns but as a tuple (1712437,0).

So I changed the output type of the function from setof pair_id_value to setof record… and if I execute this similar concatenation function:

create or replace function compute_all_pair_id_value(id_obj bigint)
    returns setof record as $$
declare
    pair record;
begin
    for pair in (select compute_pair_id_value(t.id, t.obj_value)  from my_obj as t where t.id = id_obj)
    loop
            return next pair;
    end loop;
    return; 
end; $$ language plpgsql;

I get the error: a column definition list is required for functions returning "record"

Trying to follow the answer to this SO question: I have tried defining the column definition in the select this way select compute_pair_id_value(t.id, t.obj_value) as f(id bigint, value integer), the complete code is here:

create or replace function compute_all_pair_id_value(id_obj bigint)
    returns setof record as $$
declare
    pair record;
begin
    for pair in (select compute_pair_id_value(t.id, t.obj_value) as f(id bigint, value integer) from my_obj as t where t.id = id_obj)
    loop
            return next pair;
    end loop;
    return; 
end; $$ language plpgsql;

But when launching the sql script, psql doesn't accept to create the function:
syntax error at or near "(" select compute_pair_id_value(t.id, t.obj_value) as f(id bigint, value integer) … pointing the finger at the f(

Any idea how to do it properly ?

Should I consider to create temporary table to do the job ?

Best Answer

The approach you're using is unnecessarily complex - and very inefficient. Instead of the first function use:

create or replace function compute_pair_id_value(id bigint, value integer)
    returns setof pair_id_value
as $$
SELECT $1, generate_series(0,$2);
$$                          
language sql;

or better, get rid of it entirely and write the whole operation like this:

-- Sample data creation:
CREATE TABLE my_obj(id bigint, obj_value integer);
insert into my_obj(id,obj_value) VALUES (1712437,2),(17000,5);

-- and the query:
SELECT id, generate_series(0,obj_value) FROM my_obj;

Resulting in:

regress=> SELECT id, generate_series(0,obj_value) FROM my_obj;
   id    | generate_series 
---------+-----------------
 1712437 |               0
 1712437 |               1
 1712437 |               2
   17000 |               0
   17000 |               1
   17000 |               2
   17000 |               3
   17000 |               4
   17000 |               5
(9 rows)

This exploits PostgreSQL's behaviour with set-returning functions called in the SELECT list. Once PostgreSQL 9.3 comes out it can be replaced with a standards-compliant LATERAL query.

Since it turns out your question was a simplified version of the real problem, let's tackle that. I'll work with the simplified compute_pair_id_value above to avoid the hassle of plpython3. Here's how to do what you want:

SELECT (compute_pair_id_value(id,obj_value)).* FROM my_obj;

Result:

regress=> SELECT (compute_pair_id_value(id,obj_value)).* FROM my_obj;
   id    | value 
---------+-------
 1712437 |     0
 1712437 |     1
 1712437 |     2
   17000 |     0
   17000 |     1
   17000 |     2
   17000 |     3
   17000 |     4
   17000 |     5
(9 rows)

but again, be warned that compute_pair_id_value will be called more than once. This is a limitation of PostgreSQL's query executor that can be avoided in 9.3 with LATERAL support, but as far as I know you're stuck with it in 9.2 and below. Observe:

create or replace function compute_pair_id_value(id bigint, value integer)
    returns setof pair_id_value
as $$
BEGIN
  RAISE NOTICE 'compute_pair_id_value(%,%)',id,value;
  RETURN QUERY SELECT $1, generate_series(0,$2);
END;
$$             
language plpgsql;

output:

regress=> SELECT (compute_pair_id_value(id,obj_value)).* FROM my_obj;
NOTICE:  compute_pair_id_value(1712437,2)
NOTICE:  compute_pair_id_value(1712437,2)
NOTICE:  compute_pair_id_value(17000,5)
NOTICE:  compute_pair_id_value(17000,5)
   id    | value 
---------+-------
 1712437 |     0
 1712437 |     1
 1712437 |     2
   17000 |     0
   17000 |     1
   17000 |     2
   17000 |     3
   17000 |     4
   17000 |     5
(9 rows)

See how compute_pair_id_value is called once per output column?

There is a workaround: Another layer of subquery to unpack the composite type result. See:

regress=> SELECT (val).* FROM (SELECT compute_pair_id_value(id,obj_value) FROM my_obj) x(val);
NOTICE:  compute_pair_id_value(1712437,2)
NOTICE:  compute_pair_id_value(17000,5)
   id    | value 
---------+-------
 1712437 |     0
 1712437 |     1
 1712437 |     2
   17000 |     0
   17000 |     1
   17000 |     2
   17000 |     3
   17000 |     4
   17000 |     5
(9 rows)

You can use the same technique in your code if you really must LOOP over the results (it's slow to do that, so avoid it if you can).

Related Solutions

Postgresql – Error: set_valued function called in context that cannot accept a set. What is it about

The error message isn't very helpful:

regress=> SELECT * FROM  compute_all_pair_by_craig(100);
ERROR:  a column definition list is required for functions returning "record"
LINE 1: SELECT * FROM  compute_all_pair_by_craig(100);

but if you rephrase the query to call it as a proper set-returning function you'll see the real problem:

regress=> SELECT * FROM compute_all_pair_by_craig(100);
ERROR:  a column definition list is required for functions returning "record"
LINE 1: SELECT * FROM compute_all_pair_by_craig(100);

If you're using SETOF RECORD without an OUT parameter list you must specify the results in the calling statement, eg:

regress=> SELECT * FROM compute_all_pair_by_craig(100) theresult(a integer, b integer);

However, it's much better to use RETURNS TABLE or OUT parameters. With the former syntax your function would be:

create or replace function compute_all_pair_by_craig(id_obj bigint)
    returns table(a integer, b integer) as $$
begin
    return query select o.id, generate_series(0,o.value) from m_obj as o;     
end;
$$ language plpgsql;

This is callable in SELECT-list context and can be used without creating a type explicitly or specifying the result structure at the call site.

As for the second half of the question, what's happening is that the 1st case specifies two separate columns in a SELECT-list, wheras the second returns a single composite. It's actually not to do with how you're returning the result, but how you're invoking the function. If we create the sample function:

CREATE OR REPLACE FUNCTION twocols() RETURNS TABLE(a integer, b integer) 
AS $$ SELECT x, x FROM generate_series(1,5) x; $$ LANGUAGE sql;

You'll see the difference in the two ways to call a set-returning function - in the SELECT list, a PostgreSQL specific non-standard extension with quirky behaviour:

regress=> SELECT twocols();
 twocols 
---------
 (1,1)
 (2,2)
 (3,3)
 (4,4)
 (5,5)
(5 rows)

or as a table in the more standard way:

regress=> SELECT * FROM twocols();
 a | b 
---+---
 1 | 1
 2 | 2
 3 | 3
 4 | 4
 5 | 5
(5 rows)

Postgresql – RETURN NEXT in Postgres Function

If the results are not meant to be used in a subquery but by code, you may use a REFCURSOR in a transaction.

Example:

CREATE FUNCTION example_cursor() RETURNS refcursor AS $$
DECLARE
  c refcursor;
BEGIN
  c:='mycursorname';
  OPEN c FOR select * from generate_series(1,100000);
  return c;                                       
end;
$$ language plpgsql;

Usage for the caller:

BEGIN;
SELECT example_cursor();
 [output: mycursor]
FETCH 10 FROM mycursor;

 Output:

 generate_series 
-----------------
               1
               2
               3
               4
               5
               6
               7
               8
               9
              10

CLOSE mycursor;
END;

When not interested in piecemeal retrieval, FETCH ALL FROM cursorname may also be used to stream all results to the caller in one step.

Best Answer

Related Solutions

Postgresql – Error: set_valued function called in context that cannot accept a set. What is it about

Postgresql – RETURN NEXT in Postgres Function

Related Question