PostgreSQL – Select First Row (Grouping) and Add Aggregate Function

aggregatepostgispostgresqlwindow functions

First have a look at this question on StackOverflow.

I'm looking to accomplish the same task, except I also need to add an aggregate function (PostGIS's ST_Union) to my query.

How can I combine the answer using DISTINCT with an aggregation…

I've tried:

SELECT DISTINCT ON (name, zonedistrict_id)
       ST_Union(geom) as geom, gid, name, zonedistrict_id, zonestyle_id, longname 
FROM zones
ORDER BY name, zonedistrict_id, zonestyle_id;

Which screams:

column "zones.gid" must appear in the GROUP BY clause or be used in an aggregate function

Which is strange, because if I remove ST_Union(geom) as geom,, the query works. But it's absent of the union'd geometry.

Best Answer

Answer to question

SELECT DISTINCT ON (name, zonedistrict_id)
       ST_Union(geom) as geom, gid, name, zonedistrict_id, zonestyle_id, longname 
FROM   zones
GROUP  BY gid, name, zonedistrict_id, zonestyle_id, longname
ORDER  BY name, zonedistrict_id, zonestyle_id;

It depends on what you are actually trying to achieve and what version of Postgres you are using and the table definition of the underlying table. You chose not to disclose any of that.

This gives you one distinct row per (name, zonedistrict_id), but geom is only the aggregate over (gid, name, zonedistrict_id, zonestyle_id, longname)

In older versions you had to list every non-aggregated column of the SELECT list in the GROUP BY list. This changed somewhat with PostgreSQL 9.1. I quote the release notes:

Allow non-GROUP BY columns in the query target list when the primary key is specified in the GROUP BY clause (Peter Eisentraut)

My guess

However, I suspect you secretly want this:

SELECT DISTINCT ON (name, zonedistrict_id)
       ST_Union(geom) OVER (PARTITION BY name, zonedistrict_id) AS geom
     , gid, name, zonedistrict_id, zonestyle_id, longname 
FROM   zones
ORDER  BY name, zonedistrict_id, zonestyle_id;

Using a ST_Union(geom) as aggregate window function. So geom is actually the aggregate over (name, zonedistrict_id), which seems more likely.

This should be possible since the PostGis manual tells us about its aggregate functions:

... can be used just like any other sql aggregate function such as sum, average.

And SQL aggregate functions can be used as window functions.