The approach you're using is unnecessarily complex - and very inefficient. Instead of the first function use:
create or replace function compute_pair_id_value(id bigint, value integer)
returns setof pair_id_value
as $$
SELECT $1, generate_series(0,$2);
$$
language sql;
or better, get rid of it entirely and write the whole operation like this:
-- Sample data creation:
CREATE TABLE my_obj(id bigint, obj_value integer);
insert into my_obj(id,obj_value) VALUES (1712437,2),(17000,5);
-- and the query:
SELECT id, generate_series(0,obj_value) FROM my_obj;
Resulting in:
regress=> SELECT id, generate_series(0,obj_value) FROM my_obj;
id | generate_series
---------+-----------------
1712437 | 0
1712437 | 1
1712437 | 2
17000 | 0
17000 | 1
17000 | 2
17000 | 3
17000 | 4
17000 | 5
(9 rows)
This exploits PostgreSQL's behaviour with set-returning functions called in the SELECT
list. Once PostgreSQL 9.3 comes out it can be replaced with a standards-compliant LATERAL
query.
Since it turns out your question was a simplified version of the real problem, let's tackle that. I'll work with the simplified compute_pair_id_value
above to avoid the hassle of plpython3. Here's how to do what you want:
SELECT (compute_pair_id_value(id,obj_value)).* FROM my_obj;
Result:
regress=> SELECT (compute_pair_id_value(id,obj_value)).* FROM my_obj;
id | value
---------+-------
1712437 | 0
1712437 | 1
1712437 | 2
17000 | 0
17000 | 1
17000 | 2
17000 | 3
17000 | 4
17000 | 5
(9 rows)
but again, be warned that compute_pair_id_value
will be called more than once. This is a limitation of PostgreSQL's query executor that can be avoided in 9.3 with LATERAL
support, but as far as I know you're stuck with it in 9.2 and below. Observe:
create or replace function compute_pair_id_value(id bigint, value integer)
returns setof pair_id_value
as $$
BEGIN
RAISE NOTICE 'compute_pair_id_value(%,%)',id,value;
RETURN QUERY SELECT $1, generate_series(0,$2);
END;
$$
language plpgsql;
output:
regress=> SELECT (compute_pair_id_value(id,obj_value)).* FROM my_obj;
NOTICE: compute_pair_id_value(1712437,2)
NOTICE: compute_pair_id_value(1712437,2)
NOTICE: compute_pair_id_value(17000,5)
NOTICE: compute_pair_id_value(17000,5)
id | value
---------+-------
1712437 | 0
1712437 | 1
1712437 | 2
17000 | 0
17000 | 1
17000 | 2
17000 | 3
17000 | 4
17000 | 5
(9 rows)
See how compute_pair_id_value
is called once per output column?
There is a workaround: Another layer of subquery to unpack the composite type result. See:
regress=> SELECT (val).* FROM (SELECT compute_pair_id_value(id,obj_value) FROM my_obj) x(val);
NOTICE: compute_pair_id_value(1712437,2)
NOTICE: compute_pair_id_value(17000,5)
id | value
---------+-------
1712437 | 0
1712437 | 1
1712437 | 2
17000 | 0
17000 | 1
17000 | 2
17000 | 3
17000 | 4
17000 | 5
(9 rows)
You can use the same technique in your code if you really must LOOP
over the results (it's slow to do that, so avoid it if you can).
DISTINCT ON()
Just as a side note, this is precisely what DISTINCT ON()
does (not to be confused with DISTINCT
)
SELECT DISTINCT ON ( expression [, ...] )
keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON
expressions are interpreted using the same rules as for ORDER BY
(see above). Note that the "first row" of each set is unpredictable unless ORDER BY
is used to ensure that the desired row appears first. For example
So if you were to write,
SELECT myFirstAgg(z)
FROM foo
GROUP BY x,y;
It's effectively
SELECT DISTINCT ON(x,y) z
FROM foo;
-- ORDER BY z;
In that it takes the first z
. There are two important differences,
You can also select other columns at no cost of further aggregation..
SELECT DISTINCT ON(x,y) z, k, r, t, v
FROM foo;
-- ORDER BY z, k, r, t, v;
Because there is no GROUP BY
you can not use (real) aggregates with it.
CREATE TABLE foo AS
SELECT * FROM ( VALUES
(1,2,3),
(1,2,4),
(1,2,5)
) AS t(x,y,z);
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;
-- fails, as you should expect.
SELECT DISTINCT ON (x,y) z, sum(z)
FROM foo;
-- would not otherwise fail.
SELECT myFirstAgg(z), sum(z)
FROM foo
GROUP BY x,y;
Don't forget ORDER BY
Also, while I didn't bold it then I will now
Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first. For example
Always use an ORDER BY
with DISTINCT ON
Using an Ordered-Set Aggregate Function
I imagine a lot of people are looking for first_value
, Ordered-Set Aggregate Functions. Just wanted to throw that out there. It would look like this, if the function existed:
SELECT a, b, first_value() WITHIN GROUP (ORDER BY z)
FROM foo
GROUP BY a,b;
But, alas you can do this.
SELECT a, b, percentile_disc(0) WITHIN GROUP (ORDER BY z)
FROM foo
GROUP BY a,b;
Best Answer
What your
my_func
is essentially doing is creating aMATERIALIZED VIEW
-- a materialized view is a cached copy of a result set stored as a table. Drop the function and use the normalMATERIALIZED VIEW
.Skip generating an array -- waste of a time and space and may even be serializing the result set to disk twice. And, instead, just use something like this:
Now you can "refresh" this by doing
REFRESH foo_view;