PostgreSQL – How to Optimize a Query That Takes Hours

join;performancepostgresql

Up until now all queries in my PostgreSQL application were fast. During the last days, sometimes a query takes several hours. Index are healthy, because after dump, restore, vacuum -z, it is still the same: the query takes hours.

EXPLAIN  VERBOSE SELECT DISTINCT "modwork_beleg"."direction" 
FROM "modwork_beleg" 
LEFT OUTER JOIN "modwork_isu_isu_pkunde" ON ("modwork_beleg"."id" = "modwork_isu_isu_pkunde"."beleg_id")
LEFT OUTER JOIN "modwork_isu_isu_vknr"   ON ("modwork_beleg"."id" = "modwork_isu_isu_vknr"."beleg_id")
LEFT OUTER JOIN "modwork_isu_isu_gpnr"   ON ("modwork_beleg"."id" = "modwork_isu_isu_gpnr"."beleg_id")
WHERE ("modwork_beleg"."state" IN (E'neu', E'inarbeit', E'wiedervorlage')
    AND (
      "modwork_isu_isu_pkunde"."pkunde" = 90237758
      OR
      "modwork_isu_isu_vknr"."vknr" = 254400297729
      OR
      "modwork_isu_isu_gpnr"."gpnr" = 1001030921
));

QUERY PLAN:

 HashAggregate  (cost=36409.29..36409.30 rows=1 width=3)
   Output: modwork_beleg.direction
   ->  Merge Right Join  (cost=28836.70..36409.29 rows=1 width=3)
         Output: modwork_beleg.direction
         Merge Cond: (modwork_isu_isu_vknr.beleg_id = modwork_beleg.id)
         Filter: ((modwork_isu_isu_pkunde.pkunde = 90237758) OR (modwork_isu_isu_vknr.vknr = 254400297729::bigint) OR (modwork_isu_isu_gpnr.gpnr = 1001030921))
         ->  Index Scan using modwork_isu_isu_vknr_beleg_id on modwork_isu_isu_vknr  (cost=0.00..6463.55 rows=203811 width=12)
               Output: modwork_isu_isu_vknr.id, modwork_isu_isu_vknr.beleg_id, modwork_isu_isu_vknr.vknr
         ->  Sort  (cost=28836.70..28876.79 rows=16037 width=19)
               Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_gpnr.gpnr, modwork_isu_isu_pkunde.pkunde
               Sort Key: modwork_beleg.id
               ->  Merge Right Join  (cost=21128.57..27716.58 rows=16037 width=19)
                     Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_gpnr.gpnr, modwork_isu_isu_pkunde.pkunde
                     Merge Cond: (modwork_isu_isu_gpnr.beleg_id = modwork_beleg.id)
                     ->  Index Scan using modwork_isu_isu_gpnr_beleg_id on modwork_isu_isu_gpnr  (cost=0.00..5883.73 rows=185491 width=12)
                           Output: modwork_isu_isu_gpnr.id, modwork_isu_isu_gpnr.beleg_id, modwork_isu_isu_gpnr.gpnr
                     ->  Sort  (cost=21128.57..21157.69 rows=11646 width=11)
                           Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_pkunde.pkunde
                           Sort Key: modwork_beleg.id
                           ->  Merge Right Join  (cost=14555.56..20342.03 rows=11646 width=11)
                                 Output: modwork_beleg.direction, modwork_beleg.id, modwork_isu_isu_pkunde.pkunde
                                 Merge Cond: (modwork_isu_isu_pkunde.beleg_id = modwork_beleg.id)
                                 ->  Index Scan using modwork_isu_isu_pkunde_beleg_id on modwork_isu_isu_pkunde  (cost=0.00..5203.79 rows=163197 width=8)
                                       Output: modwork_isu_isu_pkunde.id, modwork_isu_isu_pkunde.beleg_id, modwork_isu_isu_pkunde.pkunde
                                 ->  Sort  (cost=14555.56..14573.39 rows=7134 width=7)
                                       Output: modwork_beleg.direction, modwork_beleg.id
                                       Sort Key: modwork_beleg.id
                                       ->  Bitmap Heap Scan on modwork_beleg  (cost=140.06..14098.97 rows=7134 width=7)
                                             Output: modwork_beleg.direction, modwork_beleg.id
                                             Recheck Cond: ((state)::text = ANY ('{neu,inarbeit,wiedervorlage}'::text[]))
                                             ->  Bitmap Index Scan on modwork_beleg_state  (cost=0.00..138.28 rows=7134 width=0)
                                                   Index Cond: ((state)::text = ANY ('{neu,inarbeit,wiedervorlage}'::text[]))
(32 rows)

My PostgreSQL version:

modwork_egs_q=> select version();
                                                                version                                                                 
----------------------------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.4.8 on x86_64-unknown-linux-gnu, compiled by GCC gcc (SUSE Linux) 4.5.0 20100604 [gcc-4_5-branch revision 160292], 64-bit

Best Answer

First, I simplified your syntax quite a bit to make it more readable:

SELECT DISTINCT b.direction
FROM   modwork_beleg b
LEFT   JOIN modwork_isu_isu_pkunde p ON b.id = p.beleg_id
LEFT   JOIN modwork_isu_isu_vknr   v ON b.id = v.beleg_id
LEFT   JOIN modwork_isu_isu_gpnr   g ON b.id = g.beleg_id
WHERE  b.state IN ('neu', 'inarbeit', 'wiedervorlage')
AND   (p.pkunde = 90237758
    OR v.vknr = 254400297729
    OR g.gpnr = 1001030921);

Next, and this is the important step concerning performance:

SELECT b.direction
FROM   modwork_beleg b
JOIN   modwork_isu_isu_pkunde p ON b.id = p.beleg_id
WHERE  b.state IN ('neu', 'inarbeit', 'wiedervorlage')
AND    p.pkunde = 90237758

UNION
SELECT b.direction
FROM   modwork_beleg b
JOIN   modwork_isu_isu_vknr   v ON b.id = v.beleg_id
WHERE  b.state IN ('neu', 'inarbeit', 'wiedervorlage')
AND    v.vknr = 254400297729

UNION
SELECT b.direction
FROM   modwork_beleg b
JOIN   modwork_isu_isu_gpnr   g ON b.id = g.beleg_id
WHERE  b.state IN ('neu', 'inarbeit', 'wiedervorlage')
AND    g.gpnr = 1001030921;

The point is, that in your original query you form one big table out of all possible combinations. If the tables modwork_isu_isu_* each have, say, 100 rows per beleg_id, this results in a huge table of 100x100x100 = 1 million rows, holding all the columns of all four tables. Then you select a couple of them (your query plan shows 32 resulting rows). This is extremely inefficient.

The good news is, your query can easily be broken into three parts. I did not test, but I'd bet this is faster by several orders of magnitude.

I also changed the join type from LEFT [OUTER] JOIN to [INNER] JOIN, because with three separate queries, the condition on the right table makes each a JOIN effectively anyway.

Finally, using UNION instead of UNION ALL when combining the results of the three queries removes duplicate values in b.direction. So I can drop the redundant DISTINCT for each of the queries. All simpler and much faster.