Postgresql – Recursive query using plpgsql

functionsplpgsqlpostgresqlpostgresql-9.2

I'm trying to write a plpgsql function that recursively returns a set of columns from records in a tree structure.

I have a data table and a table to link the data together:

DATATABLE
-----------
id integer
value text
info text

LINKTABLE
-----------
link integer
parent integer

My thought was to do like in the following function:

CREATE OR REPLACE FUNCTION my_function(itemID integer)
  RETURNS TABLE(id integer, value text) AS
$BODY$
BEGIN    
    RETURN QUERY SELECT my_function(A.link) FROM linktable A, datatable B 
        WHERE A.parent = B.id AND B.id = itemID) C;

    RETURN QUERY SELECT id, value FROM datatable WHERE id = itemID;            
    RETURN;
END;
$BODY$
  LANGUAGE plpgsql VOLATILE
  COST 100;

But that doesn't work, I get an error in my first query:

ERROR: structure of query does not match function result type

My Just-In-Brain compiler detects no problems, so what am I doing wrong here?

Best Answer

You don't need a the function at all, this can be done with a single SQL statement:

with recursive tree as (id, parent) (
    select link as id, 
           parent
    from linktable
    where id = itemid

    union all

    select c.link as id,
           c.parent
    from linktable c
      join tree p on p.id = c.parent
) 
select dt.id, dt.value
from tree
  join datatable dt on dt.id = tree.id

Please see the manual for an introduction to recursive queries: http://www.postgresql.org/docs/current/static/queries-with.html

Related Solutions

Postgresql – Error: set_valued function called in context that cannot accept a set. What is it about

The error message isn't very helpful:

regress=> SELECT * FROM  compute_all_pair_by_craig(100);
ERROR:  a column definition list is required for functions returning "record"
LINE 1: SELECT * FROM  compute_all_pair_by_craig(100);

but if you rephrase the query to call it as a proper set-returning function you'll see the real problem:

regress=> SELECT * FROM compute_all_pair_by_craig(100);
ERROR:  a column definition list is required for functions returning "record"
LINE 1: SELECT * FROM compute_all_pair_by_craig(100);

If you're using SETOF RECORD without an OUT parameter list you must specify the results in the calling statement, eg:

regress=> SELECT * FROM compute_all_pair_by_craig(100) theresult(a integer, b integer);

However, it's much better to use RETURNS TABLE or OUT parameters. With the former syntax your function would be:

create or replace function compute_all_pair_by_craig(id_obj bigint)
    returns table(a integer, b integer) as $$
begin
    return query select o.id, generate_series(0,o.value) from m_obj as o;     
end;
$$ language plpgsql;

This is callable in SELECT-list context and can be used without creating a type explicitly or specifying the result structure at the call site.

As for the second half of the question, what's happening is that the 1st case specifies two separate columns in a SELECT-list, wheras the second returns a single composite. It's actually not to do with how you're returning the result, but how you're invoking the function. If we create the sample function:

CREATE OR REPLACE FUNCTION twocols() RETURNS TABLE(a integer, b integer) 
AS $$ SELECT x, x FROM generate_series(1,5) x; $$ LANGUAGE sql;

You'll see the difference in the two ways to call a set-returning function - in the SELECT list, a PostgreSQL specific non-standard extension with quirky behaviour:

regress=> SELECT twocols();
 twocols 
---------
 (1,1)
 (2,2)
 (3,3)
 (4,4)
 (5,5)
(5 rows)

or as a table in the more standard way:

regress=> SELECT * FROM twocols();
 a | b 
---+---
 1 | 1
 2 | 2
 3 | 3
 4 | 4
 5 | 5
(5 rows)

PostgreSQL – Is There a Type-Safe first() Aggregate Function?

`DISTINCT ON()`

Just as a side note, this is precisely what DISTINCT ON() does (not to be confused with DISTINCT)

SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

So if you were to write,

SELECT myFirstAgg(z)
FROM foo
GROUP BY x,y;

It's effectively

SELECT DISTINCT ON(x,y) z
FROM foo;
-- ORDER BY z;

In that it takes the first z. There are two important differences,

You can also select other columns at no cost of further aggregation..

SELECT DISTINCT ON(x,y) z, k, r, t, v
FROM foo;
-- ORDER BY z, k, r, t, v;

Because there is no GROUP BY you can not use (real) aggregates with it.

CREATE TABLE foo AS
SELECT * FROM ( VALUES
  (1,2,3),
  (1,2,4),
  (1,2,5)
) AS t(x,y,z);

SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- fails, as you should expect.
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;

-- would not otherwise fail.
SELECT myFirstAgg(z), sum(z)
FROM foo
GROUP BY x,y;

Don't forget `ORDER BY`

Also, while I didn't bold it then I will now

Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example

Always use an ORDER BY with DISTINCT ON

Using an Ordered-Set Aggregate Function

I imagine a lot of people are looking for first_value, Ordered-Set Aggregate Functions. Just wanted to throw that out there. It would look like this, if the function existed:

SELECT a, b, first_value() WITHIN GROUP (ORDER BY z)    
FROM foo
GROUP BY a,b;

But, alas you can do this.

SELECT a, b, percentile_disc(0) WITHIN GROUP (ORDER BY z)   
FROM foo
GROUP BY a,b;