Up until now all queries in my PostgreSQL application were fast. During the last days, sometimes a query takes several hours. Index are healthy, because after dump, restore, vacuum -z, it is still the same: the query takes hours.
EXPLAIN VERBOSE SELECT DISTINCT "modwork_beleg"."direction"
FROM "modwork_beleg"
LEFT OUTER JOIN "modwork_isu_isu_pkunde" ON ("modwork_beleg"."id" = "modwork_isu_isu_pkunde"."beleg_id")
LEFT OUTER JOIN "modwork_isu_isu_vknr" ON ("modwork_beleg"."id" = "modwork_isu_isu_vknr"."beleg_id")
LEFT OUTER JOIN "modwork_isu_isu_gpnr" ON ("modwork_beleg"."id" = "modwork_isu_isu_gpnr"."beleg_id")
WHERE ("modwork_beleg"."state" IN (E'neu', E'inarbeit', E'wiedervorlage')
AND (
"modwork_isu_isu_pkunde"."pkunde" = 90237758
OR
"modwork_isu_isu_vknr"."vknr" = 254400297729
OR
"modwork_isu_isu_gpnr"."gpnr" = 1001030921
));
QUERY PLAN:
HashAggregate (cost=36409.29..36409.30 rows=1 width=3)
Output: modwork_beleg.direction
-> Merge Right Join (cost=28836.70..36409.29 rows=1 width=3)
Output: modwork_beleg.direction
Merge Cond: (modwork_isu_isu_vknr.beleg_id = modwork_beleg.id)
Filter: ((modwork_isu_isu_pkunde.pkunde = 90237758) OR (modwork_isu_isu_vknr.vknr = 254400297729::bigint) OR (modwork_isu_isu_gpnr.gpnr = 1001030921))
-> Index Scan using modwork_isu_isu_vknr_beleg_id on modwork_isu_isu_vknr (cost=0.00..6463.55 rows=203811 width=12)
Output: modwork_isu_isu_vknr.id, modwork_isu_isu_vknr.beleg_id, modwork_isu_isu_vknr.vknr
-> Sort (cost=28836.70..28876.79 rows=16037 width=19)
Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_gpnr.gpnr, modwork_isu_isu_pkunde.pkunde
Sort Key: modwork_beleg.id
-> Merge Right Join (cost=21128.57..27716.58 rows=16037 width=19)
Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_gpnr.gpnr, modwork_isu_isu_pkunde.pkunde
Merge Cond: (modwork_isu_isu_gpnr.beleg_id = modwork_beleg.id)
-> Index Scan using modwork_isu_isu_gpnr_beleg_id on modwork_isu_isu_gpnr (cost=0.00..5883.73 rows=185491 width=12)
Output: modwork_isu_isu_gpnr.id, modwork_isu_isu_gpnr.beleg_id, modwork_isu_isu_gpnr.gpnr
-> Sort (cost=21128.57..21157.69 rows=11646 width=11)
Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_pkunde.pkunde
Sort Key: modwork_beleg.id
-> Merge Right Join (cost=14555.56..20342.03 rows=11646 width=11)
Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_pkunde.pkunde
Merge Cond: (modwork_isu_isu_pkunde.beleg_id = modwork_beleg.id)
-> Index Scan using modwork_isu_isu_pkunde_beleg_id on modwork_isu_isu_pkunde (cost=0.00..5203.79 rows=163197 width=8)
Output: modwork_isu_isu_pkunde.id, modwork_isu_isu_pkunde.beleg_id, modwork_isu_isu_pkunde.pkunde
-> Sort (cost=14555.56..14573.39 rows=7134 width=7)
Output: modwork_beleg.direction, modwork_beleg.id
Sort Key: modwork_beleg.id
-> Bitmap Heap Scan on modwork_beleg (cost=140.06..14098.97 rows=7134 width=7)
Output: modwork_beleg.direction, modwork_beleg.id
Recheck Cond: ((state)::text = ANY ('{neu,inarbeit,wiedervorlage}'::text[]))
-> Bitmap Index Scan on modwork_beleg_state (cost=0.00..138.28 rows=7134 width=0)
Index Cond: ((state)::text = ANY ('{neu,inarbeit,wiedervorlage}'::text[]))
(32 rows)
My PostgreSQL version:
modwork_egs_q=> select version();
version
----------------------------------------------------------------------------------------------------------------------------------------
PostgreSQL 8.4.8 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292], 64-bit
Best Answer
First, I simplified your syntax quite a bit to make it more readable:
Next, and this is the important step concerning performance:
The point is, that in your original query you form one big table out of all possible combinations. If the tables
modwork_isu_isu_*
each have, say, 100 rows perbeleg_id
, this results in a huge table of 100x100x100 = 1 million rows, holding all the columns of all four tables. Then you select a couple of them (your query plan shows 32 resulting rows). This is extremely inefficient.The good news is, your query can easily be broken into three parts. I did not test, but I'd bet this is faster by several orders of magnitude.
I also changed the join type from
LEFT [OUTER] JOIN
to[INNER] JOIN
, because with three separate queries, the condition on the right table makes each aJOIN
effectively anyway.Finally, using
UNION
instead ofUNION ALL
when combining the results of the three queries removes duplicate values inb.direction
. So I can drop the redundantDISTINCT
for each of the queries. All simpler and much faster.