You can have window functions on the result of aggregate functions in a single query level.
This would all work nicely after a few modifications - except that it fails for the standard deviation on mathematical principal. The involved calculations are not linear, so you cannot simply combine standard deviations of sub-populations.
SELECT perm
,combo
,avg(value) AS perm_average_value
,sum(avg(value) * count(*)) OVER w_combo /
sum(count(*)) OVER w_combo AS combo_average_value
,stddev_pop(value) AS perm_stddev
,0 AS combo_stddev -- doesn't work!
,count(*) AS perm_count
,sum(count(*)) OVER w_combo AS combo_count
FROM foo
GROUP BY perm, combo
WINDOW w_combo AS (PARTITION BY combo);
For combo_average_value
you would need this expression
sum(avg(value) * count(*)) OVER w_combo / sum(count(*)) OVER w_combo
Since you need a weighted average. (The average of a group with 10 members weighs more than the average of a group with just 2 members!)
This works:
SELECT DISTINCT ON (perm, combo)
perm
,combo
,avg(value) OVER wpc AS perm_average_value
,avg(value) OVER wc AS combo_average_value
,stddev_pop(value) OVER wpc AS perm_stddev
,stddev_pop(value) OVER wc AS combo_stddev
,count(*) OVER wpc AS perm_count
,count(*) OVER wc AS combo_count
FROM foo
WINDOW wc AS (PARTITION BY combo)
,wpc AS (PARTITION BY perm, combo);
I am using two different windows here, and reduce the rows with DISTINCT
which is applied even after window functions.
But I seriously doubt it will be faster than your original query. I am pretty sure it isn't.
Better performance with altered table layout
Arrays have an overhead of 24 bytes (slight variations depending on type). Also, you seem to have quite a few items per array and many repetitions. For a huge table like yours it would pay to normalize the schema. Example layout:
CREATE TABLE combo (
combo_id serial PRIMARY KEY
,combo int[] NOT NULL
);
CREATE TABLE perm (
perm_id serial PRIMARY KEY
,perm int[] NOT NULL
);
CREATE TABLE value (
perm_id int REFERENCES perm(perm_id)
,combo_id int REFERENCES combo(combo_id)
,value numeric NOT NULL DEFAULT 0
);
If you don't need referential integrity you can omit the foreign key constraints.
The connection to combo_id
could also be placed in the table perm
, but in this scenario I would store it (slightly de-normalized) in value
for better performance.
This would result in a row size of 32 bytes (tuple header + padding: 24 bytes, 2 x int (8 byte), no padding), plus the unknown size of your numeric
column. (If you don't need extreme precision, a double precision
or even a real
column might do, too.)
More on physical storage in this related answer on SO or here:
Configuring PostgreSQL for read performance
Anyway, that's only a fraction of what you have now and would make your query a lot faster by size alone. Grouping and sorting on simple integers is also a lot faster.
You would first aggregate in a subquery and then join to perm
and combo
for best performance.
This can be much simpler, yet, with DISTINCT ON
:
SELECT DISTINCT ON (product_id)
product_id
, CASE WHEN stock = 0 THEN NULL ELSE warehouse_id END AS warehouse_id
, stock
, CASE WHEN stock = 0 THEN NULL ELSE price END AS price
FROM product_stock
ORDER BY product_id, (stock = 0), price;
Assuming stock
to be NOT NULL
.
SQL Fiddle.
About DISTINCT ON
:
Postgres has a proper boolean
type and one can ORDER BY
any boolean expression. FALSE
sorts before TRUE
sorts before NULL
. So rows with (stock = 0)
sort behind rows with any other value for stock
- except NULL, which would sort last.
Best Answer
This it a variation of a greatest-n-per-group problem.
I would do this with a join to a derived table: