The elementary difference is that window functions are applied to all rows in a result set to compute additional columns after the rest of the result set has been determined. No row is dropped. They are available since PostgreSQL 8.4.
The LIMIT
and OFFSET
clauses of the SELECT
command on the other hand do not compute additional columns. They just pick a certain "window" of rows from the result set (in cooperation with the ORDER BY
clause) and discard the rest. Have been there like for ever.
While certain tasks can be tackled with either of these tools, they are very different in nature.
For your simple task
sorting data on date and then bring the latest data first
you don't need either of them. Just add:
ORDER BY date_col DESC
According to your comment, you would need:
SELECT col1, col2
FROM tbl
ORDER BY date_col DESC
LIMIT 100 -- 100 latest rows
OFFSET 0; -- just noise, but may be easier to code
Retrieve more:
...
LIMIT 100
OFFSET 100; -- next page of 100 rows ...
Be sure to have an index on date_col in either case!
You can have window functions on the result of aggregate functions in a single query level.
This would all work nicely after a few modifications - except that it fails for the standard deviation on mathematical principal. The involved calculations are not linear, so you cannot simply combine standard deviations of sub-populations.
SELECT perm
,combo
,avg(value) AS perm_average_value
,sum(avg(value) * count(*)) OVER w_combo /
sum(count(*)) OVER w_combo AS combo_average_value
,stddev_pop(value) AS perm_stddev
,0 AS combo_stddev -- doesn't work!
,count(*) AS perm_count
,sum(count(*)) OVER w_combo AS combo_count
FROM foo
GROUP BY perm, combo
WINDOW w_combo AS (PARTITION BY combo);
For combo_average_value
you would need this expression
sum(avg(value) * count(*)) OVER w_combo / sum(count(*)) OVER w_combo
Since you need a weighted average. (The average of a group with 10 members weighs more than the average of a group with just 2 members!)
This works:
SELECT DISTINCT ON (perm, combo)
perm
,combo
,avg(value) OVER wpc AS perm_average_value
,avg(value) OVER wc AS combo_average_value
,stddev_pop(value) OVER wpc AS perm_stddev
,stddev_pop(value) OVER wc AS combo_stddev
,count(*) OVER wpc AS perm_count
,count(*) OVER wc AS combo_count
FROM foo
WINDOW wc AS (PARTITION BY combo)
,wpc AS (PARTITION BY perm, combo);
I am using two different windows here, and reduce the rows with DISTINCT
which is applied even after window functions.
But I seriously doubt it will be faster than your original query. I am pretty sure it isn't.
Better performance with altered table layout
Arrays have an overhead of 24 bytes (slight variations depending on type). Also, you seem to have quite a few items per array and many repetitions. For a huge table like yours it would pay to normalize the schema. Example layout:
CREATE TABLE combo (
combo_id serial PRIMARY KEY
,combo int[] NOT NULL
);
CREATE TABLE perm (
perm_id serial PRIMARY KEY
,perm int[] NOT NULL
);
CREATE TABLE value (
perm_id int REFERENCES perm(perm_id)
,combo_id int REFERENCES combo(combo_id)
,value numeric NOT NULL DEFAULT 0
);
If you don't need referential integrity you can omit the foreign key constraints.
The connection to combo_id
could also be placed in the table perm
, but in this scenario I would store it (slightly de-normalized) in value
for better performance.
This would result in a row size of 32 bytes (tuple header + padding: 24 bytes, 2 x int (8 byte), no padding), plus the unknown size of your numeric
column. (If you don't need extreme precision, a double precision
or even a real
column might do, too.)
More on physical storage in this related answer on SO or here:
Configuring PostgreSQL for read performance
Anyway, that's only a fraction of what you have now and would make your query a lot faster by size alone. Grouping and sorting on simple integers is also a lot faster.
You would first aggregate in a subquery and then join to perm
and combo
for best performance.
Best Answer
Your issue appears to be that you are applying the same
WINDOW
(namedw
) for both yourCOUNT(*)
and yourrank()
.When you use a
WINDOW
which contains anORDER BY
clause, and you then apply certain aggregations such asSUM
orCOUNT
, it applies the aggregation continuously across the ordering, which is why yourCOUNT
andrank()
are identical.If you modify your query have multiple windows as
so that you are only applying the
PARTITION BY
to yourCOUNT(*)
window, and have bothPARTITION BY
andORDER BY
clauses for yourrank()
, I believe you'll get the results you want.Refer to this SQL FIDDLE as a reference, where I have a generic
id
field, acom_num
field to represent the competition id, and acom_time
field to represent a competitors time.