A few thoughts. Consider range partitioning on "timestamp", it will reduce the amount of work your query have to do. Further optimization might be to calculate and store the agg for "closed" partitions. You will of course have to recalculate this when you modify historical information.
As a bonus it will be much easier to roll-out historical data that is no longer needed.
As mentioned this is just some thoughts and reflections, it may or it may not fit your situation. One pitfall may be the number of different intervals that you use for reporting. If larger intervals is not multiples of the partitioning interval this wont work well.
Edit: How is the plan affected by creating an temp table and an index on "timestamp"?
create index X on market_trades ( timestamp );
create temp table T ( time_range tsrange );
insert into T ( time_range )
WITH vals AS (
SELECT '2013-08-19 0:00'::timestamp AS frame_start,
'2013-08-26 0:00'::timestamp AS frame_end,
'17h'::interval AS interval_length
), intervals AS (
SELECT tsrange(start_time,
lead(start_time, 1, frame_end) OVER (ORDER BY start_time NULLS FIRST)) AS time_range
FROM (
SELECT generate_series(frame_start, frame_end, interval_length) AS
start_time, frame_end
FROM vals
) _
WHERE start_time < frame_end
)
SELECT time_range
FROM intervals i;
create index x on T ( time_range );
analyze t;
SELECT time_range, count(td.id) AS agg
FROM T
LEFT JOIN market_trades td
ON td.timestamp <@ T.time_range
GROUP BY T.time_range
ORDER BY T.time_range;
Maybe this can helps you.
I've divided the clock in groups of 5 minutes:
group of 5 min = extract('minute' from time)::int / 5
min 0 to 4 = 0
min 5 to 9 = 1
....
This adds an extra column to your current table.
select *, extract('minute' from time)::int / 5 as g_of_5m
from foo
order by ip, time;
+----+----------+-----------------+------------+---------+
| | time | ip | type | g_of_5m |
+----+----------+-----------------+------------+---------+
| 1 | 22:35:00 | 111.111.111.111 | page_open | 7 |
+----+----------+-----------------+------------+---------+
| 2 | 22:36:00 | 111.111.111.111 | page_open | 7 |
+----+----------+-----------------+------------+---------+
| 3 | 22:37:00 | 111.111.111.111 | page_close | 7 |
+----+----------+-----------------+------------+---------+
| 4 | 22:42:00 | 111.111.111.111 | page_close | 8 |
+----+----------+-----------------+------------+---------+
| 5 | 22:44:00 | 111.111.111.111 | page_open | 8 |
+----+----------+-----------------+------------+---------+
| 6 | 22:42:00 | 222.111.111.111 | page_open | 8 |
+----+----------+-----------------+------------+---------+
| 7 | 22:43:00 | 222.111.111.111 | page_open | 8 |
+----+----------+-----------------+------------+---------+
| 8 | 22:44:00 | 222.111.111.111 | page_close | 8 |
+----+----------+-----------------+------------+---------+
As Evan Carroll has pointed in his comment, it should be grouped by hour
,g_of_5m
, just to take care of this case: (same g_of_5m, more than hour)
22:40 111.111.111.111 page_open
23:41 111.111.111.111 page_open
And this is the query:
select min(time) as init_time, ip, type, count(*) as count
from foo
group by ip, type, extract('hour' from time), extract('minute' from time)::int / 5
order by ip, min(time);
+----+-----------+-----------------+------------+-------+
| | init_time | ip | type | count |
+----+-----------+-----------------+------------+-------+
| 1 | 22:35:00 | 111.111.111.111 | page_open | 2 |
+----+-----------+-----------------+------------+-------+
| 2 | 22:37:00 | 111.111.111.111 | page_close | 1 |
+----+-----------+-----------------+------------+-------+
| 3 | 22:42:00 | 111.111.111.111 | page_close | 1 |
+----+-----------+-----------------+------------+-------+
| 4 | 22:44:00 | 111.111.111.111 | page_open | 1 |
+----+-----------+-----------------+------------+-------+
| 5 | 22:42:00 | 222.111.111.111 | page_open | 2 |
+----+-----------+-----------------+------------+-------+
| 6 | 22:44:00 | 222.111.111.111 | page_close | 1 |
+----+-----------+-----------------+------------+-------+
Just if you want to check it: http://rextester.com/CVWBM15291
Best Answer
As I mentioned in my comment, your request is not well defined because it is unclear how the 4-minute wide time windows are supposed to be chosen. In addition, the nature and importance of the
id
column is unclear.Be that as it may, I came up with the following query. It uses window functions in order to define the group edges, and then filters out any row that is not an edge: