Postgresql – Optimizing a query on a large table (EXPLAIN ANALYZE result inside)

postgresql

I have a table market_trades with a timestamp column:

api_production=# \d market_trades;
                                       Table "public.market_trades"
     Column     |            Type             |                         Modifiers                          
----------------+-----------------------------+------------------------------------------------------------
 id             | integer                     | not null default nextval('market_trades_id_seq'::regclass)
 market_id      | integer                     | not null
 timestamp      | timestamp without time zone | not null
 amount         | numeric(16,8)               | not null
 price          | numeric(16,8)               | not null
 created_at     | timestamp without time zone | 
 local_id       | integer                     | not null
Indexes:
    "market_trades_pkey" PRIMARY KEY, btree (id)
    "market_trades__uniqueness" UNIQUE, btree (market_id, "timestamp", local_id)

which has about 30 million rows.

In the simplest case, I need to take a time range and sub-interval length, break the time range into sub-intervals of the given length, and count the number of trades with timestamp in each sub-interval.

For example, if the time range is [2013-01-01 00:00, 2013-02-01 00:00) and the sub-interval length is 1 day, then my result set will include 31 rows (one for each day), and will list the total number of records for that day.

I already designed a query to do this:

WITH vals AS (
    SELECT '2013-08-19 0:00'::timestamp  AS frame_start,
           '2013-08-26 0:00'::timestamp  AS frame_end,
           '17h'::interval               AS interval_length
),   intervals AS (
    SELECT tsrange(start_time,
                   lead(start_time, 1, frame_end) OVER (ORDER BY start_time NULLS FIRST)) AS time_range
    FROM (
        SELECT generate_series(frame_start, frame_end, interval_length) AS start_time,
               frame_end
        FROM vals
    ) _
    WHERE start_time < frame_end
)
SELECT time_range, count(td.id) AS agg
FROM intervals i
LEFT JOIN market_trades td
ON td.timestamp <@ i.time_range
GROUP  BY time_range
ORDER  BY time_range;

which outputs

                  time_range                   |  agg  
-----------------------------------------------+-------
 ["2013-08-19 00:00:00","2013-08-19 17:00:00") | 10852
 ["2013-08-19 17:00:00","2013-08-20 10:00:00") |  4727
 ["2013-08-20 10:00:00","2013-08-21 03:00:00") |  7875
 ["2013-08-21 03:00:00","2013-08-21 20:00:00") | 12137
 ["2013-08-21 20:00:00","2013-08-22 13:00:00") |  8454
 ["2013-08-22 13:00:00","2013-08-23 06:00:00") | 10284
 ["2013-08-23 06:00:00","2013-08-23 23:00:00") |  8385
 ["2013-08-23 23:00:00","2013-08-24 16:00:00") |  4978
 ["2013-08-24 16:00:00","2013-08-25 09:00:00") |  4709
 ["2013-08-25 09:00:00","2013-08-26 00:00:00") |  8620
(10 rows)

Time: 64433.256 ms

The problem is that it is VERY slow. (As you can see, that query took over a minute to run!). So, naturally, I tried adding a btree index on the timestamp column. But this didn't help at all. Also, I've found that the time it takes is proportional to the number of sub-intervals (i.e. doubling both the overall time range and the interval length takes about the same amount of time as the original).

Here is the output of EXPLAIN ANALYZE on the above query:

                                                                        QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=198931037.42..198931037.92 rows=200 width=36) (actual time=460481.744..460481.750 rows=10 loops=1)
   Sort Key: i.time_range
   Sort Method: quicksort  Memory: 25kB
   CTE vals
     ->  Result  (cost=0.00..0.01 rows=1 width=0) (actual time=0.002..0.003 rows=1 loops=1)
   CTE intervals
     ->  WindowAgg  (cost=31.47..38.13 rows=333 width=16) (actual time=0.057..0.174 rows=10 loops=1)
           ->  Sort  (cost=31.47..32.30 rows=333 width=16) (actual time=0.047..0.063 rows=10 loops=1)
                 Sort Key: _.start_time
                 Sort Method: quicksort  Memory: 25kB
                 ->  Subquery Scan on _  (cost=0.00..17.52 rows=333 width=16) (actual time=0.012..0.034 rows=10 loops=1)
                       Filter: (_.start_time < _.frame_end)
                       ->  CTE Scan on vals  (cost=0.00..5.02 rows=1000 width=32) (actual time=0.009..0.019 rows=10 loops=1)
   ->  HashAggregate  (cost=198930989.64..198930991.64 rows=200 width=36) (actual time=460481.713..460481.721 rows=10 loops=1)
         ->  Nested Loop Left Join  (cost=0.00..198881140.83 rows=9969761 width=36) (actual time=25612.620..460406.061 rows=81021 loops=1)
               Join Filter: (td."timestamp" <@ i.time_range)
               Rows Removed by Join Filter: 299306019
               ->  CTE Scan on intervals i  (cost=0.00..6.66 rows=333 width=32) (actual time=0.060..0.221 rows=10 loops=1)
               ->  Materialize  (cost=0.00..875147.34 rows=29939223 width=12) (actual time=0.006..24275.128 rows=29938704 loops=10)
                     ->  Seq Scan on market_trades td  (cost=0.00..579263.23 rows=29939223 width=12) (actual time=0.008..22103.565 rows=29938704 loops=1)
 Total runtime: 460617.278 ms
(21 rows)

Any suggestions?

EDIT: The problem is clearly that it's checking td.timestamp <@ i.time_range for every pair of a trade and an interval. There must be a better way to filter records within a timestamp range.

EDIT 2: If I change the line

ON td.timestamp <@ i.time_range

to

ON td.timestamp >= lower(i.time_range) AND td.timestamp < upper(i.time_range)

then it executes MUCH faster, so long as timestamp is indexed.

But this is still not ideal, because the nice thing about time ranges is that they properly handle NULL bounds (no lower bound or no upper bound), and checking for those NULL conditions manually in the ON clause slows down the query even more than the original.

Is there an indexing strategy (or better query structure) that makes working with time ranges as fast as start/end points?

Best Answer

A few thoughts. Consider range partitioning on "timestamp", it will reduce the amount of work your query have to do. Further optimization might be to calculate and store the agg for "closed" partitions. You will of course have to recalculate this when you modify historical information.

As a bonus it will be much easier to roll-out historical data that is no longer needed.

As mentioned this is just some thoughts and reflections, it may or it may not fit your situation. One pitfall may be the number of different intervals that you use for reporting. If larger intervals is not multiples of the partitioning interval this wont work well.

Edit: How is the plan affected by creating an temp table and an index on "timestamp"?

create index X on market_trades ( timestamp );

create temp table T ( time_range tsrange );

insert into T ( time_range )
WITH vals AS (
    SELECT '2013-08-19 0:00'::timestamp  AS frame_start,
       '2013-08-26 0:00'::timestamp  AS frame_end,
       '17h'::interval               AS interval_length
),   intervals AS (
    SELECT tsrange(start_time,
               lead(start_time, 1, frame_end) OVER (ORDER BY start_time   NULLS FIRST)) AS time_range
    FROM (
        SELECT generate_series(frame_start, frame_end, interval_length) AS  
               start_time, frame_end
        FROM vals
    ) _
    WHERE start_time < frame_end
)
SELECT time_range
FROM intervals i;

create index x on T ( time_range );

analyze t;

SELECT time_range, count(td.id) AS agg
FROM T
LEFT JOIN market_trades td
    ON td.timestamp <@ T.time_range
GROUP  BY T.time_range
ORDER  BY T.time_range;