Note the slightly modified schema of my test in the fiddle.
Using actual primary keys and proper column names instead of id
.
Also, you seem to be operating with dates exclusively. So I suggest to convert your timestamp
columns to date
.
Items 1 and 2
SELECT k.keyword_id
, k.name
, pr.project_id
, COALESCE(min(pr.position), 0) AS pos
, COALESCE(pr.created_at, now()::date) AS created_at
FROM keyword k
LEFT JOIN project_report pr USING (keyword_id)
GROUP BY k.keyword_id, pr.project_id, pr.created_at
ORDER BY keyword_id, created_at
;
- In Postgres 9.1 or later the pk column covers the whole table in
GROUP BY
.
- Use
COALESCE
to replace possible NULL
values.
A guess at item 3
WITH cte AS (
SELECT k.keyword_id
, k.name
, pr.project_id
, COALESCE(min(pr.position), 0) AS pos
, COALESCE(pr.created_at, now()::date) AS created_at
FROM keyword k
LEFT JOIN project_report pr USING (keyword_id)
GROUP BY k.keyword_id, pr.project_id, pr.created_at
)
, x AS (
SELECT DISTINCT ON (keyword_id, project_id) *
FROM cte
ORDER BY keyword_id, project_id, created_at DESC
)
SELECT x.*
, COALESCE(y.pos, 0) AS yesterday_pos
, COALESCE(w.pos, 0) AS week_pos
, COALESCE(m.pos, 0) AS month_pos
FROM x
LEFT JOIN cte y ON y.keyword_id = x.keyword_id
AND y.project_id = x.project_id
AND y.created_at = x.created_at - interval '1 day'
LEFT JOIN cte w ON w.keyword_id = x.keyword_id
AND w.project_id = x.project_id
AND w.created_at = x.created_at - interval '1 week'
LEFT JOIN cte m ON m.keyword_id = x.keyword_id
AND m.project_id = x.project_id
AND m.created_at = x.created_at - interval '1 month'
;
Explain
- In CTE
cte
produce daily aggregates per (k.keyword_id, pr.project_id)
.
- In CTE
x
pick the latest day per (k.keyword_id, pr.project_id)
.
- In the outer query
LEFT JOIN
the latest day x
to cte
multiple times to retrieve past values from the same (k.keyword_id, pr.project_id)
for 1 day / week / month earlier.
SQL Fiddle.
A few thoughts. Consider range partitioning on "timestamp", it will reduce the amount of work your query have to do. Further optimization might be to calculate and store the agg for "closed" partitions. You will of course have to recalculate this when you modify historical information.
As a bonus it will be much easier to roll-out historical data that is no longer needed.
As mentioned this is just some thoughts and reflections, it may or it may not fit your situation. One pitfall may be the number of different intervals that you use for reporting. If larger intervals is not multiples of the partitioning interval this wont work well.
Edit: How is the plan affected by creating an temp table and an index on "timestamp"?
create index X on market_trades ( timestamp );
create temp table T ( time_range tsrange );
insert into T ( time_range )
WITH vals AS (
SELECT '2013-08-19 0:00'::timestamp AS frame_start,
'2013-08-26 0:00'::timestamp AS frame_end,
'17h'::interval AS interval_length
), intervals AS (
SELECT tsrange(start_time,
lead(start_time, 1, frame_end) OVER (ORDER BY start_time NULLS FIRST)) AS time_range
FROM (
SELECT generate_series(frame_start, frame_end, interval_length) AS
start_time, frame_end
FROM vals
) _
WHERE start_time < frame_end
)
SELECT time_range
FROM intervals i;
create index x on T ( time_range );
analyze t;
SELECT time_range, count(td.id) AS agg
FROM T
LEFT JOIN market_trades td
ON td.timestamp <@ T.time_range
GROUP BY T.time_range
ORDER BY T.time_range;
Best Answer
You can use a CASE expression inside the COUNT aggregate funciton.
The COUNT aggregate does not count NULL values, so that's why it works.
Here is a SQL Fiddle that demonstrates the concept.