Postgresql – How to make PG query faster even when it is using indices

optimizationperformancepostgresqlquery-performance

I have a PG table with about 30M records. I'm noticing that when I query the table with IN clause that would fetch lot of rows, the query is taking a long time to respond (~3 minutes) Question Although my query is using indices, how can I make the query faster?

mydb=> explain analyze select count(scores), sum(scores) from mytable 
       where classes in ('1st','2nd','3rd') and school_year in ('2018');


QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=1150864.84..1150864.85 rows=1 width=40) (actual time=185454.773..185454.773 rows=1 loops=1)
   ->  Bitmap Heap Scan on mytable  (cost=16998.63..1146247.22 rows=923525 width=6) (actual time=934.546..184704.246 rows=967663 loops=1)
         Recheck Cond: ((classes)::text = ANY ('{1st,2nd,3rd}'::text[]))
         Filter: (school_year = 2018)
         Heap Blocks: exact=448386
         ->  Bitmap Index Scan on mytable_classes_idx  (cost=0.00..16767.75 rows=923525 width=0) (actual time=736.690..736.690 rows=967663 loops=1)
               Index Cond: ((classes)::text = ANY ('{1st,2nd,3rd}'::text[]))
 Planning time: 1.252 ms
 Execution time: 185463.153 ms
(9 rows)

Additional Details

                                                version
--------------------------------------------------------------------------------------------------------
 PostgreSQL 10.9 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.3 20140911 (Red Hat 4.8.3-9), 64-bit
(1 row)

Best Answer

One odd thing is that the filter Filter: (school_year = 2018) doesn't remove any rows. So either the entire table has school_year = 2018, or all the rows which meet the criteria classes in ('1st','2nd','3rd') have school_year = 2018. Either way, it seems a bit odd. Is this how your real problem is, or is it an artefact of creating a smaller demonstration problem?

The slowness probably comes from the IO needed to read a large amount of the table. Doing the EXPLAIN (ANALYZE, BUFFERS), especially with "track_io_timing" turned on, would be especially helpful to figure that out.

If so, creating an index that includes all of the columns, like:

create index on mytable (classes, school_year, score);

should make this much faster. It will allow an index-only scan, instead of an scan which still needs to visit the table for each row. But for this to be effective, the table must be kept well-vacuumed. The default settings of autovac will probably not be enough.