Please can you help with a slow query that we have with a large amount of data (~500K rows). We're on Postgres 10.
SELECT mike_romeo, SUM(hotel) as counts
FROM whiskey_lima
WHERE
yankee = 'kilo'
AND
whiskey_india = 'a_value'
AND
echo = 'romeo_november'
GROUP BY
mike_romeo;
And my table looks like this:
Column | Type | Collation | Nullable | Default
---------------+-----------------------------+-----------+----------+--------------------------------------------------------
id | bigint | | not null | nextval('id_seq'::regclass)
yankee | mike_hotel.quebec | | not null |
whiskey_india | character varying | | not null |
echo | mike_hotel.romeo_yankee | | not null |
mike_romeo | character varying | | not null |
oscar | timestamp without time zone | | not null |
hotel | integer | | not null |
papa | timestamp without time zone | | |
Indexes:
"whiskey_lima_pkey" PRIMARY KEY, btree (id)
"index_unique" UNIQUE, btree (echo, mike_romeo, yankee, whiskey_india, oscar)
"index_query" btree (yankee, whiskey_india, echo)
"index_on_oscar" btree (oscar)
Our EXPLAIN (ANALYSE, BUFFERS)
looks like this: https://explain.depesz.com/s/4lU
GroupAggregate (cost=77294.010..78398.740 rows=38882 width=69) (actual time=56343.468..59194.302 rows=527807 loops=1)
Group Key: mike_romeo
Buffers: shared hit=61428 read=68664, temp read=23413 written=23450
-> Sort (cost=77294.010..77532.650 rows=95455 width=65) (actual time=56343.461..58848.379 rows=530929 loops=1)
Sort Key: mike_romeo
Sort Method: external merge Disk: 111488kB
Buffers: shared hit=61428 read=68664, temp read=23413 written=23450
-> Index Scan using mike_quebec on whiskey_lima (cost=0.560..67104.720 rows=95455 width=65) (actual time=1.647..38814.509 rows=530929 loops=1)
Index Cond: ((yankee = 'kilo'::mike_hotel.quebec) AND ((whiskey_india)::text = 'juliet'::text) AND (echo = 'romeo_november'::mike_hotel.romeo_yankee))
Buffers: shared hit=61425 read=68664
Questions
What should we focus on – are there any obvious solutions?
Can we avoid the sort?
Why has the query planner underestimated the rows – could this be causing some slowness? We're trying to enable extended statistics
We're looking into increasing work_mem
so that the disk sort is mitigated – would this make the whole query performant?
What else am I missing?
Thanks a lot for your help
Best Answer
I would try with an index like:
Perhaps you can replace your index_unique with:
The idea is to put the predicates first in then index, followed by group by columns.