PostgreSQL – How to Fix Very Slow Simple Query on RDS

indexindex-tuningperformancepostgresqlpostgresql-performance

I seem to be getting very slow queries on a medium sized RDS box (db.m3.medium, 3.7gb ram).

This is across a table of 4,152,928 rows..

select sum(some_field) c
from pages
where pages.some_id=123
and pages.first_action_at > '2014-01-01 00:00:00 +1000'

Total runtime: 45031 ms.
Locally, I have around 1.1million rows, and the same query takes about 450ms..

Here's the query plan, from explain:

Aggregate  (cost=475640.59..475640.60 rows=1 width=4)
   ->  Seq Scan on pages  (cost=0.00..475266.07 rows=149809 width=4)
         Filter: ((first_action_at > '2014-01-01 00:00:00'::timestamp without time zone) 
                AND (some_id = 447))

Here's the response from explain analyze:

 Aggregate  (cost=475641.74..475641.76 rows=1 width=4) (actual time=42419.717..42419.718 rows=1 loops=1)
   ->  Seq Scan on pages  (cost=0.00..475267.22 rows=149810 width=4) (actual time=0.013..42265.908 rows=141559 loops=1)
    Filter: ((first_action_at > '2014-01-01 00:00:00'::timestamp without time zone) AND (some_id = 447))
    Rows Removed by Filter: 4011369

Total runtime: 42419.772 ms

For reference, 141559 rows are part of the sum().

The current indexes I have are:

:some_id
:some_id, :first_action_at

work_mem was previously set to 1 mb (RDS default). I've just changed this to 18 mb.

Edit: Seems to be resolved by upping work_mem as well as added the second index above, speed is now around 800 ms.

Best Answer

Matching index

After re-reading your question I realized you are not running Amazon Redshift, but Amazon RDS, which seems to be running unsullied Postgres, at least according to the documentation:

Amazon RDS supports DB instances running several versions of PostgreSQL. Currently we support PostgreSQL versions 9.3.1, 9.3.2, and 9.3.3.

This would mean you have index-only scans at your disposal. If you meet some preconditions (basically if vacuum can keep up with write operations) and if some_field is not updated to often and reasonably small (which seems to be the case for a numeric column), the perfect index would include some_field in last position (like @zerkms first mentioned):

CREATE INDEX ON pages(some_id, first_action_at, some_field);

Note that some_id should come before first_action_at, because it is typically more efficient to have columns with equality checks first and ranges later. Details:
Multicolumn index and performance

If you don't see "index-only scan" in EXPLAIN ANALYZE, the last column is just ballast and better left away:

CREATE INDEX ON pages(some_id, first_action_at);

(Like you have now, according to your question update.)

Either way, another index on just (some_id) only offers very little over this multicolumn index:
Is a composite index also good for queries on the first field?

Server configuration

All the usual advice for slow queries and proper server configuration applies and a work_mem setting of 1 MB is much too low for a DB with millions of rows. But this particular setting should not be crucial for this particular query, since work_mem is (per documentation):

memory to be used by internal sort operations and hash tables.

Neither applies here.