You can have window functions on the result of aggregate functions in a single query level.
This would all work nicely after a few modifications - except that it fails for the standard deviation on mathematical principal. The involved calculations are not linear, so you cannot simply combine standard deviations of sub-populations.
SELECT perm
,combo
,avg(value) AS perm_average_value
,sum(avg(value) * count(*)) OVER w_combo /
sum(count(*)) OVER w_combo AS combo_average_value
,stddev_pop(value) AS perm_stddev
,0 AS combo_stddev -- doesn't work!
,count(*) AS perm_count
,sum(count(*)) OVER w_combo AS combo_count
FROM foo
GROUP BY perm, combo
WINDOW w_combo AS (PARTITION BY combo);
For combo_average_value
you would need this expression
sum(avg(value) * count(*)) OVER w_combo / sum(count(*)) OVER w_combo
Since you need a weighted average. (The average of a group with 10 members weighs more than the average of a group with just 2 members!)
This works:
SELECT DISTINCT ON (perm, combo)
perm
,combo
,avg(value) OVER wpc AS perm_average_value
,avg(value) OVER wc AS combo_average_value
,stddev_pop(value) OVER wpc AS perm_stddev
,stddev_pop(value) OVER wc AS combo_stddev
,count(*) OVER wpc AS perm_count
,count(*) OVER wc AS combo_count
FROM foo
WINDOW wc AS (PARTITION BY combo)
,wpc AS (PARTITION BY perm, combo);
I am using two different windows here, and reduce the rows with DISTINCT
which is applied even after window functions.
But I seriously doubt it will be faster than your original query. I am pretty sure it isn't.
Better performance with altered table layout
Arrays have an overhead of 24 bytes (slight variations depending on type). Also, you seem to have quite a few items per array and many repetitions. For a huge table like yours it would pay to normalize the schema. Example layout:
CREATE TABLE combo (
combo_id serial PRIMARY KEY
,combo int[] NOT NULL
);
CREATE TABLE perm (
perm_id serial PRIMARY KEY
,perm int[] NOT NULL
);
CREATE TABLE value (
perm_id int REFERENCES perm(perm_id)
,combo_id int REFERENCES combo(combo_id)
,value numeric NOT NULL DEFAULT 0
);
If you don't need referential integrity you can omit the foreign key constraints.
The connection to combo_id
could also be placed in the table perm
, but in this scenario I would store it (slightly de-normalized) in value
for better performance.
This would result in a row size of 32 bytes (tuple header + padding: 24 bytes, 2 x int (8 byte), no padding), plus the unknown size of your numeric
column. (If you don't need extreme precision, a double precision
or even a real
column might do, too.)
More on physical storage in this related answer on SO or here:
Configuring PostgreSQL for read performance
Anyway, that's only a fraction of what you have now and would make your query a lot faster by size alone. Grouping and sorting on simple integers is also a lot faster.
You would first aggregate in a subquery and then join to perm
and combo
for best performance.
Best Answer
Answer to question
It depends on what you are actually trying to achieve and what version of Postgres you are using and the table definition of the underlying table. You chose not to disclose any of that.
This gives you one distinct row per
(name, zonedistrict_id)
, butgeom
is only the aggregate over(gid, name, zonedistrict_id, zonestyle_id, longname)
In older versions you had to list every non-aggregated column of the
SELECT
list in theGROUP BY
list. This changed somewhat with PostgreSQL 9.1. I quote the release notes:My guess
However, I suspect you secretly want this:
Using a
ST_Union(geom)
as aggregate window function. Sogeom
is actually the aggregate over(name, zonedistrict_id)
, which seems more likely.This should be possible since the PostGis manual tells us about its aggregate functions:
And SQL aggregate functions can be used as window functions.