Postgres Query Performance – How to Optimize Query with MAX?

performancepostgresqlquery-performance

I'm trying to make a relatively simple query, but it's taking much longer than I'd expect. I have an index in place, but it doesn't seem to be helping much.

Here's the query. It takes over an hour and a half to execute:

SELECT MAX("transactions"."api_last_change_date") AS max_id 
FROM "transactions" 
WHERE "transactions"."practice_id" = 466;

Here's the table, with indices:

                                          Table "public.transactions"
        Column        |            Type             |                         Modifiers
----------------------+-----------------------------+------------------------------------------------------------
 id                   | integer                     | not null default nextval('transactions_id_seq'::regclass)
 installation_id      | character varying(255)      |
 data_provider_id           | character varying(255)      | not null
 transaction_date     | timestamp without time zone |
 pms_code             | character varying(255)      |
 pms_code_data_provider_id  | character varying(255)      |
 description          | text                        |
 quantity             | integer                     |
 amount               | integer                     |
 is_payment           | character varying(255)      |
 trans_type           | character varying(255)      |
 client_pms_id        | character varying(255)      |
 patient_pms_id       | character varying(255)      |
 client_data_provider_id    | character varying(255)      |
 patient_data_provider_id   | character varying(255)      | not null
 provider_id          | character varying(255)      |
 provider_name        | character varying(255)      |
 invoice_id           | character varying(255)      |
 transaction_total    | integer                     |
 practice_id          | integer                     | not null
 api_create_date      | timestamp without time zone | default '2015-10-09 05:00:00'::timestamp without time zone
 api_last_change_date | timestamp without time zone | default '2015-10-09 05:00:00'::timestamp without time zone
 api_removed_date     | timestamp without time zone |
Indexes:
    "transactions_pkey" PRIMARY KEY, btree (id)
    "index_transactions_on_api_last_change_date" btree (api_last_change_date)
    "index_transactions_on_patient_data_provider_id" btree (patient_data_provider_id)
    "index_transactions_on_practice_id" btree (practice_id)
    "index_transactions_on_data_provider_id" btree (data_provider_id)
    "transactions_practice_id_api_last_change_date" btree (practice_id, api_last_change_date DESC NULLS LAST)

Here's the results of explain on the query:

explain SELECT MAX("transactions"."api_last_change_date") AS max_id FROM "transactions" WHERE "transactions"."practice_id" = 466;
                                                                    QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------
 Result  (cost=818.01..818.02 rows=1 width=0)
   InitPlan 1 (returns $0)
     ->  Limit  (cost=0.11..818.01 rows=1 width=8)
           ->  Index Scan Backward using index_transactions_on_api_last_change_date on transactions  (cost=0.11..141374929.80 rows=172851 width=8)
                 Index Cond: (api_last_change_date IS NOT NULL)
                 Filter: (practice_id = 466)
(6 rows)

The transactions table has almost 150 million records. Only about 20000 records have practice_id = 466.

As you can see, there are multiple indices on the table, including one that I thought would work specifically for this query (transactions_practice_id_api_last_change_date), but postgres is choosing to use a different one (index_patients_on_api_last_change_date). From my understanding the index I created should work well, and should be max O(logn) using a btree with the two parameters as the given key.

I was trying to get an explain analyze result for this, but each time I've tried to run it, I've run into connectivity issues before it completes. If I can get it to run successfully, I'll post the results here.

How would I go about improving the performance of this query?

Best Answer

An index on:

transactions (practice_id ASC, api_last_change_date DESC)
WHERE (api_last_change_date IS NOT NULL)

would likely be very helpful here.