Okay, so here is the query modified to work the way you want:
DECLARE @players table
(
PlayerID uniqueidentifier NOT NULL PRIMARY KEY,
PlayerName nvarchar(64) NOT NULL
);
DECLARE @playerScores table
(
ID bigint NOT NULL IDENTITY PRIMARY KEY,
PlayerID uniqueidentifier NOT NULL,
DateCreated datetime NOT NULL,
Score int NOT NULL,
TimeTaken bigint NOT NULL,
PuzzleID int NOT NULL
);
DECLARE @puzzleId int = 0;
SELECT TOP 50
a.PlayerID,
p.PlayerName,
a.Score,
a.TimeTaken,
a.PlayedDate
FROM
(
SELECT
ps.PlayerID,
ps.Score,
ps.TimeTaken,
ps.DateCreated AS PlayedDate,
ROW_NUMBER()
OVER
(
PARTITION BY ps.PlayerID
ORDER BY ps.Score DESC, ps.TimeTaken, ps.DateCreated
) AS RN
FROM @playerScores ps
WHERE ps.PuzzleID = @puzzleId
) a
INNER JOIN @players p ON p.PlayerID = a.PlayerID
WHERE a.RN = 1
ORDER BY
a.Score DESC,
a.TimeTaken,
a.PlayedDate;
Having written this (note: indexes are not optimized), and looking at the other queries you're going to need to write, what I would actually recommend is to abandon this type of query entirely, and create a denormalized high-score table (rows are unique on the combination of PlayerID
, PuzzleID
), on which to run aggregates instead.
The reason why is because the GameResult
table is going to grow huge in the database, and so it will be less and less efficient to run aggregates on it directly as time passes, and the requirements are incompatible with doing something like creating an indexed view to summarize the information.
Also, if you aren't doing this already, it's highly likely you'll want to use an asynchronous process to compute the "leaderboards" periodically and cache the results, instead of computing them just-in-time. (You could do something like merge the current player's score with the cached leaderboards so the player can see themself on the leaderboards immediately if they got a high score.) See my answer here for some ideas to consider when implementing a caching mechanism.
The query you have
You could simplify your query using a WINDOW
clause, but that's just shortening the syntax, not changing the query plan.
SELECT id, trans_ref_no, amount, trans_date, entity_id
, SUM(amount) OVER w AS trans_total
, COUNT(*) OVER w AS trans_count
FROM transactiondb
WINDOW w AS (PARTITION BY entity_id, date_trunc('month',trans_date)
ORDER BY trans_date
ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING);
- Also using the slightly faster
count(*)
, since id
is certainly defined NOT NULL
?
- And you don't need to
ORDER BY entity_id
since you already PARTITION BY entity_id
You can simplify further, though:
Don't add ORDER BY
to the window definition at all, it's not relevant to your query. Then you don't need to define a custom window frame, either:
SELECT id, trans_ref_no, amount, trans_date, entity_id
, SUM(amount) OVER w AS trans_total
, COUNT(*) OVER w AS trans_count
FROM transactiondb
WINDOW w AS (PARTITION BY entity_id, date_trunc('month',trans_date);
Simpler, faster, but still just a better version of what you have, with static months.
The query you might want
... is not clearly defined, so I'll build on these assumptions:
Count transactions and amount for every 30-day period within the first and last transaction of any entity_id
. Exclude leading and trailing periods without activity, but include all possible 30-day periods within those outer bounds.
SELECT entity_id, trans_date
, COALESCE(sum(daily_amount) OVER w, 0) AS trans_total
, COALESCE(sum(daily_count) OVER w, 0) AS trans_count
FROM (
SELECT entity_id
, generate_series (min(trans_date)::timestamp
, GREATEST(min(trans_date), max(trans_date) - 29)::timestamp
, interval '1 day')::date AS trans_date
FROM transactiondb
GROUP BY 1
) x
LEFT JOIN (
SELECT entity_id, trans_date
, sum(amount) AS daily_amount, count(*) AS daily_count
FROM transactiondb
GROUP BY 1, 2
) t USING (entity_id, trans_date)
WINDOW w AS (PARTITION BY entity_id ORDER BY trans_date
ROWS BETWEEN CURRENT ROW AND 29 FOLLOWING);
This lists all 30-day periods for each entity_id
with your aggregates and with trans_date
being the first day (incl.) of the period. To get values for each individual row join to the base table once more ...
The basic difficulty is the same as discussed here:
The frame definition of a window cannot depend on values of the current row.
And rather call generate_series()
with timestamp
input:
The query you actually want
After question update and discussion:
Accumulate rows of the same entity_id
in a 30-day window starting at each actual transaction.
Since your data is distributed sparsely, it should be more efficient to run a self-join with a range condition, all the more since Postgres 9.1 does not have LATERAL
joins, yet:
SELECT t0.id, t0.amount, t0.trans_date, t0.entity_id
, sum(t1.amount) AS trans_total, count(*) AS trans_count
FROM transactiondb t0
JOIN transactiondb t1 USING (entity_id)
WHERE t1.trans_date >= t0.trans_date
AND t1.trans_date < t0.trans_date + 30 -- exclude upper bound
-- AND t0.entity_id = 114284 -- or pick a single entity ...
GROUP BY t0.id -- is PK!
ORDER BY t0.trans_date, t0.id
SQL Fiddle.
A rolling window could only make sense (with respect to performance) with data for most days.
This does not aggregate duplicates on (trans_date, entity_id)
per day, but all rows of the same day are always included in the 30-day window.
For a big table, a covering index like this could help quite a bit:
CREATE INDEX transactiondb_foo_idx
ON transactiondb (entity_id, trans_date, amount);
The last column amount
is only useful if you get index-only scans out of it. Else drop it.
But it's not going to be used while you select the whole table anyway. It would support queries for a small subset.
Best Answer
ROLLUP
was implemented in Postgres 9.5.Original answer for Postgres 9.4:
The
rollup()
ingroup by rollup(i_category,i_class)
causes aggregation in hierarchical steps. It's an extension of the ISO SQL standard in SQL Server that's not implemented in Postgres. For two items, you need three steps in standard SQL - as well as in Postgres:group by i_category
group by i_category, i_class
The
grouping()
function comes with this extension and indicates whether a given column is aggregated in each row. It's used here to sort the total and group sums first in the result.rank()
is a standard window function that works the same in Postgres, but we have to rewrite it due to the above adjustments.top 100
translates toFETCH FIRST 100 ROWS ONLY
in standard SQL, which is implemented in Postgres as well as the shorterLIMIT 100
(FETCH FIRST
syntax works in SQL Server for 2012+ versions, too.) It's a bit odd to cut off after 100 rows in this aggregate query.I am using a CTE to reuse the intermediary results in multiple aggregations below. Except for the
LIMIT
everything should basically work in SQL Server just as well: