Postgresql – Indices for a join query with datetime range

optimizationperformancepostgresqlquery-performance

Given the following query:

SELECT DISTINCT COUNT(DISTINCT "patients"."id") 
FROM "patients" 
INNER JOIN "transactions" ON "transactions"."patient_data_id" = "patients"."patient_data_id" 
WHERE "patients"."practice_id" = 6 
AND "patients"."species" = 'canine' 
AND "transactions"."practice_id" = 6 
AND (("transactions"."transaction_date" BETWEEN '2015-01-01' AND '2015-12-31' OR 1=0));

This query is generated by Rails, and that's where the OR 1=0 shorcut comes from.

What indices would you make sure were in place to ensure that this query runs as quickly as possible?

I'm working with a large database, where the transactions table is around 200 million records. The patients table is much smaller.

Execution time with just one query running is around 2.3 seconds, but we need to be able to run about 50 of these queries simultaneously. When that happens, they bog down incredibly, to about one query/minute.

How would you make this run faster?

Best Answer

For this specific query, I would create indexes on patients(practice_id, species) and transactions(patient_data_id, practice_id, transaction_date) and then rewrite the query as follows:

select count(distinct p.id)
from patients p
where p.practice_id = 6 and p.species = 'canine'
    and exists (select 1 
                from transactions 
                where patient_data_id = p.patient_data_id 
                and practice_id = 6 
                and transaction_date BETWEEN '2015-01-01' AND '2015-12-31')

The database won't go through all the transactions but will stop once it finds the first transaction that meets the conditions, as that's all you need to find out the final result (basically, patients that have at least one transaction). I also removed the first distinct as you don't need it - count will return only one row.