I seem to be getting very slow queries on a medium sized RDS box (db.m3.medium, 3.7gb ram).
This is across a table of 4,152,928 rows..
select sum(some_field) c
from pages
where pages.some_id=123
and pages.first_action_at > '2014-01-01 00:00:00 +1000'
Total runtime: 45031 ms.
Locally, I have around 1.1million rows, and the same query takes about 450ms..
Here's the query plan, from explain:
Aggregate (cost=475640.59..475640.60 rows=1 width=4)
-> Seq Scan on pages (cost=0.00..475266.07 rows=149809 width=4)
Filter: ((first_action_at > '2014-01-01 00:00:00'::timestamp without time zone)
AND (some_id = 447))
Here's the response from explain analyze:
Aggregate (cost=475641.74..475641.76 rows=1 width=4) (actual time=42419.717..42419.718 rows=1 loops=1)
-> Seq Scan on pages (cost=0.00..475267.22 rows=149810 width=4) (actual time=0.013..42265.908 rows=141559 loops=1)
Filter: ((first_action_at > '2014-01-01 00:00:00'::timestamp without time zone) AND (some_id = 447))
Rows Removed by Filter: 4011369
Total runtime: 42419.772 ms
For reference, 141559 rows are part of the sum().
The current indexes I have are:
:some_id
:some_id, :first_action_at
work_mem
was previously set to 1 mb (RDS default). I've just changed this to 18 mb.
Edit: Seems to be resolved by upping work_mem
as well as added the second index above, speed is now around 800 ms.
Best Answer
Matching index
After re-reading your question I realized you are not running Amazon Redshift, but Amazon RDS, which seems to be running unsullied Postgres, at least according to the documentation:
This would mean you have index-only scans at your disposal. If you meet some preconditions (basically if
vacuum
can keep up with write operations) and ifsome_field
is not updated to often and reasonably small (which seems to be the case for a numeric column), the perfect index would includesome_field
in last position (like @zerkms first mentioned):Note that
some_id
should come beforefirst_action_at
, because it is typically more efficient to have columns with equality checks first and ranges later. Details:Multicolumn index and performance
If you don't see "index-only scan" in
EXPLAIN ANALYZE
, the last column is just ballast and better left away:(Like you have now, according to your question update.)
Either way, another index on just
(some_id)
only offers very little over this multicolumn index:Is a composite index also good for queries on the first field?
Server configuration
All the usual advice for slow queries and proper server configuration applies and a
work_mem
setting of 1 MB is much too low for a DB with millions of rows. But this particular setting should not be crucial for this particular query, sincework_mem
is (per documentation):Neither applies here.