Postgresql – What indexes would I create for a query like this

indexpostgresql

Here's a query I've got:

SELECT "balance_transactions"."fee"
FROM "balance_transactions" 
JOIN charges ON balance_transactions.source = charges.balance_id 
WHERE "balance_transactions"."account_id" = 123 
AND (balance_transactions.type = 'charge' 
AND charges.refunded = false 
AND charges.invoice IS NOT NULL) 
AND ("balance_transactions"."created" BETWEEN '2013-12-29' AND '2014-01-28');

What's hurting my brain is figuring out what I need to index to optimize a query like this.

Would I create individual indexes on each column in that WHERE statement? Or a single index that encompasses multiple columns? Or something else?

I happen to be using Postgres.

Here's the output of EXPLAIN ANALYZE:

 Nested Loop  (cost=40.44..9294.43 rows=744 width=4) (actual time=0.791..32.687 rows=1131 loops=1)
   ->  Bitmap Heap Scan on balance_transactions  (cost=40.02..2266.08 rows=1023 width=22) (actual time=0.737..1.006 rows=1201 loops=1)
         Recheck Cond: ((account_id = 79) AND ((type)::text = 'charge'::text) AND (created >= '2013-12-29 00:00:00'::timestamp without time zone) AND (created <= '2014-01-28 00:00:00'::timestamp without time zone))
         ->  Bitmap Index Scan on index_balance_transactions_account_type_created  (cost=0.00..39.77 rows=1023 width=0) (actual time=0.716..0.716 rows=1201 loops=1)
               Index Cond: ((account_id = 79) AND ((type)::text = 'charge'::text) AND (created >= '2013-12-29 00:00:00'::timestamp without time zone) AND (created <= '2014-01-28 00:00:00'::timestamp without time zone))
   ->  Index Scan using index_stripe_charges_on_stripe_id on charges  (cost=0.42..6.86 rows=1 width=18) (actual time=0.025..0.025 rows=1 loops=1201)
         Index Cond: ((balance_id)::text = (balance_transactions.source)::text)
         Filter: ((NOT refunded) AND (invoice IS NOT NULL))
         Rows Removed by Filter: 0
 Total runtime: 32.924 ms
(10 rows)

Best Answer

I would suggest the following two indexes:

charges(balance_id, refunded, invoice)
balance_transactions(account_id, type, created, source, fee)

The first is a covering index for charges. The index should be used for the query with no need for the data tables.

The second is a covering index for balance_transactions. The first three will satisfy the where clause. The four column will be used for the join and the final is needed for the select.