I have this table:
CREATE TABLE spp.rtprices (
"interval" timestamp without time zone NOT NULL,
rtlmp numeric(12,6),
rtmcc numeric(12,6),
rtmcl numeric(12,6),
node_id integer NOT NULL,
CONSTRAINT rtprices_pkey PRIMARY KEY ("interval", node_id),
CONSTRAINT rtprices_node_id_fkey FOREIGN KEY (node_id)
REFERENCES spp.nodes (node_id) MATCH SIMPLE
ON UPDATE RESTRICT ON DELETE RESTRICT
)
And one more relevant index:
CREATE INDEX rtprices_node_id_interval_idx ON spp.rtprices (node_id, "interval");
Against it I've made this view:
CREATE OR REPLACE VIEW spp.rtprices_hourly AS
SELECT (rtprices."interval" - '00:05:00'::interval)::date::timestamp without time zone AS pricedate,
date_part('hour'::text, date_trunc('hour'::text, rtprices."interval" - '00:05:00'::interval))::integer + 1 AS hour,
rtprices.node_id,
round(avg(rtprices.rtlmp), 2) AS rtlmp,
round(avg(rtprices.rtmcc), 2) AS rtmcc,
round(avg(rtprices.rtmcl), 2) AS rtmcl
FROM spp.rtprices
GROUP BY date_part('hour'::text, date_trunc('hour'::text, rtprices."interval" - '00:05:00'::interval))::integer + 1,
rtprices.node_id,
(rtprices."interval" - '00:05:00'::interval)::date::timestamp without time zone;
The point of which is to give averages of the numeric columns for each hour (The timestamps have data every 5 minutes). The problem is that a query for a single day for a single node_id
takes upwards of 30 seconds for 24 records.
explain analyze select * from spp.rtprices_hourly
where node_id=20 and pricedate='2015-02-02'
Returns this:
"HashAggregate (cost=1128767.71..1128773.79 rows=135 width=28) (actual time=31155.023..31155.065 rows=24 loops=1)"
" Group Key: ((date_part('hour'::text, date_trunc('hour'::text, (rtprices."interval" - '00:05:00'::interval))))::integer + 1), rtprices.node_id, (((rtprices."interval" - '00:05:00'::interval))::date)::timestamp without time zone"
" -> Bitmap Heap Scan on rtprices (cost=10629.42..1128732.91 rows=2320 width=28) (actual time=25071.410..31153.715 rows=288 loops=1)"
" Recheck Cond: (node_id = 20)"
" Rows Removed by Index Recheck: 7142233"
" Filter: (((("interval" - '00:05:00'::interval))::date)::timestamp without time zone = '2015-02-02 00:00:00'::timestamp without time zone)"
" Rows Removed by Filter: 124909"
" Heap Blocks: exact=43076 lossy=82085"
" -> Bitmap Index Scan on rtprices_node_id_interval_idx (cost=0.00..10628.84 rows=464036 width=0) (actual time=68.999..68.999 rows=125197 loops=1)"
" Index Cond: (node_id = 20)"
"Planning time: 5.243 ms"
"Execution time: 31155.392 ms"
Best Answer
Simpler view
For this objective:
.. it seems just as good to truncate to full hours, which is simpler and cheaper:
Faster query
Either way, an equivalent query on the view with sargable predicates would be:
This is faster, but still not as fast as it could be. The major performance hit is because the index can only be used with an index condition on
node_id
, which is preserved in its original state in the view. That's why your indexrtprices_node_id_interval_idx
withnode_id
first is important. Why?The second predicate on
hour
has to be filtered after the tuple has been fetched from the heap (the row has been read from the table). The large majority of rows is discarded late into the process, much work for nothing.Much faster with direct query
It would be much faster to run the original query and apply predicates before you aggregate:
You'll see index conditions for all predicates now. The more efficient index is still the one with
node_id
first. Why?Fast & short: create a function
So, this won't work nicely with a view. Use a function instead:
Now you get top performance with a simple query:
I put in a convenience feature, the second parameter defaults to "one day later" if you omit it:
More about function parameters and default values:
You can query any range: