Postgresql – Optimize and understand this Query Explain – POSTGRESQL

explainoptimizationperformancepostgresql-9.1query-performance

I have this query which is working perfectly fine. But This takes atleast 10 minutes to finish the query. And I have approximately 12m to 19million records in each table.

Need Help to optimize it asmuch as possible.

This query is taking same time even after creating index on c_invoiceline_id(qtyinvoiced)

SELECT distinct inv.c_invoice_id FROM c_invoice inv
JOIN c_invoiceline invl on invl.c_invoice_id=inv.c_invoice_id and invl.qtyinvoiced < 0
JOIN c_orderline ordl on ordl.c_orderline_id = invl.c_orderline_id
JOIN c_invoicetax invtax on invtax.c_invoice_id=inv.c_invoice_id
JOIN (select c_invoice_id from c_invoiceline 
    group by c_invoice_id having SUM(CASE WHEN qtyinvoiced > 0 THEN 1 ELSE 0 END) > 0 and SUM(CASE WHEN qtyinvoiced < 0 THEN 1 ELSE 0 END) > 0) as a
    on a.c_invoice_id=inv.c_invoice_id
JOIN (select oline.c_order_id from c_orderline oline
    join c_order ord on ord.c_order_id=oline.c_order_id
    where ord.em_obpos_applications_id is not null
    group by oline.c_order_id having SUM(CASE WHEN oline.qtyordered < 0 THEN 1 ELSE 0 END)=COUNT(oline.qtyordered)) ord 
    on ord.c_order_id=ordl.c_order_id
WHERE inv.issotrx='Y' AND inv.processed='Y'
and inv.em_fet_docstatus is null 
group by inv.c_invoice_id having sum(invtax.taxamt)<0

This query will get all the invoices which have lines both positive and negative quantity(qtyinvoiced) and also the orders linked to it should have all negative lines (c_orderline_id->c_order (this order should have all negative quantities (qtyordered) with additional condition having sum of invoice taxamount <0.

EXPLAIN ANALYZE:

Unique  (cost=10825659.85..10833993.63 rows=209630 width=39) (actual time=513931.305..513986.433 rows=283 loops=1)
  ->  GroupAggregate  (cost=10825659.85..10833469.56 rows=209630 width=39) (actual time=513931.304..513986.324 rows=283 loops=1)
        Filter: (sum(invtax.taxamt) < 0::numeric)
        Rows Removed by Filter: 829
        ->  Sort  (cost=10825659.85..10827389.63 rows=691911 width=39) (actual time=513930.580..513954.442 rows=76950 loops=1)
              Sort Key: inv.c_invoice_id
              Sort Method: external merge  Disk: 3712kB
              ->  Hash Join  (cost=10439166.75..10739622.70 rows=691911 width=39) (actual time=476487.224..512926.969 rows=76950 loops=1)
                    Hash Cond: ((invl.c_invoice_id)::text = (invtax.c_invoice_id)::text)
                    ->  Merge Join  (cost=10099065.81..10325822.99 rows=39599 width=99) (actual time=462189.711..499417.438 rows=8632 loops=1)
                          Merge Cond: ((oline.c_order_id)::text = (ordl.c_order_id)::text)
                          ->  GroupAggregate  (cost=6943261.18..7167377.39 rows=155839 width=38) (actual time=348397.360..385628.774 rows=54672 loops=1)
                                Filter: (sum(CASE WHEN (oline.qtyordered < 0::numeric) THEN 1 ELSE 0 END) = count(oline.qtyordered))
                                Rows Removed by Filter: 6247490
                                ->  Sort  (cost=6943261.18..6987694.82 rows=17773458 width=38) (actual time=348396.557..379344.229 rows=14875695 loops=1)
                                      Sort Key: oline.c_order_id
                                      Sort Method: external merge  Disk: 698416kB
                                      ->  Hash Join  (cost=1517792.23..3831058.59 rows=17773458 width=38) (actual time=85920.487..194792.145 rows=14878581 loops=1)
                                            Hash Cond: ((oline.c_order_id)::text = (ord.c_order_id)::text)
                                            ->  Seq Scan on c_orderline oline  (cost=0.00..1595801.39 rows=19157939 width=38) (actual time=0.554..86407.273 rows=18538376 loops=1)
                                            ->  Hash  (cost=1390894.74..1390894.74 rows=6247239 width=33) (actual time=85914.971..85914.971 rows=6303476 loops=1)
                                                  Buckets: 8192  Batches: 128  Memory Usage: 3164kB
                                                  ->  Seq Scan on c_order ord  (cost=0.00..1390894.74 rows=6247239 width=33) (actual time=2.070..82863.884 rows=6303476 loops=1)
                                                        Filter: (em_obpos_applications_id IS NOT NULL)
                                                        Rows Removed by Filter: 499796
                          ->  Materialize  (cost=3155804.63..3156002.62 rows=39599 width=132) (actual time=113733.044..113739.700 rows=9885 loops=1)
                                ->  Sort  (cost=3155804.63..3155903.63 rows=39599 width=132) (actual time=113733.040..113735.749 rows=9885 loops=1)
                                      Sort Key: ordl.c_order_id
                                      Sort Method: quicksort  Memory: 2958kB
                                      ->  Nested Loop  (cost=2026157.81..3150071.62 rows=39599 width=132) (actual time=95154.815..113679.957 rows=9885 loops=1)
                                            ->  Hash Join  (cost=2026157.24..2820544.84 rows=49856 width=132) (actual time=95153.418..105945.254 rows=9898 loops=1)
                                                  Hash Cond: ((invl.c_invoice_id)::text = (inv.c_invoice_id)::text)
                                                  ->  Bitmap Heap Scan on c_invoiceline invl  (cost=6407.70..799503.15 rows=342098 width=66) (actual time=86.177..10752.037 rows=334268 loops=1)
                                                        Recheck Cond: (qtyinvoiced < 0::numeric)
                                                        Rows Removed by Index Recheck: 629137
                                                        ->  Bitmap Index Scan on qtyinvoicedindex  (cost=0.00..6322.17 rows=342098 width=0) (actual time=76.845..76.845 rows=334268 loops=1)
                                                              Index Cond: (qtyinvoiced < 0::numeric)
                                                  ->  Hash  (cost=2019700.82..2019700.82 rows=3898 width=66) (actual time=95065.477..95065.477 rows=1120 loops=1)
                                                        Buckets: 1024  Batches: 1  Memory Usage: 108kB
                                                        ->  Nested Loop  (cost=1829544.06..2019700.82 rows=3898 width=66) (actual time=87786.269..95063.804 rows=1120 loops=1)
                                                              ->  HashAggregate  (cost=1829543.51..1829944.69 rows=26745 width=38) (actual time=87769.457..88276.901 rows=22293 loops=1)
                                                                    Filter: ((sum(CASE WHEN (c_invoiceline.qtyinvoiced > 0::numeric) THEN 1 ELSE 0 END) > 0) AND (sum(CASE WHEN (c_invoiceline.qtyinvoiced < 0::numeric) THEN 1 ELSE 0 END) > 0))
                                                                    Rows Removed by Filter: 1336779
                                                                    ->  Seq Scan on c_invoiceline  (cost=0.00..1580251.56 rows=19943356 width=38) (actual time=0.516..69047.673 rows=19929036 loops=1)
                                                              ->  Index Scan using c_invoice_key on c_invoice inv  (cost=0.55..7.08 rows=1 width=33) (actual time=0.303..0.303 rows=0 loops=22293)
                                                                    Index Cond: ((c_invoice_id)::text = (c_invoiceline.c_invoice_id)::text)
                                                                    Filter: ((em_fet_docstatus IS NULL) AND (issotrx = 'Y'::bpchar) AND (processed = 'Y'::bpchar))
                                                                    Rows Removed by Filter: 1
                                            ->  Index Scan using c_orderline_key on c_orderline ordl  (cost=0.56..6.60 rows=1 width=66) (actual time=0.774..0.779 rows=1 loops=9898)
                                                  Index Cond: ((c_orderline_id)::text = (invl.c_orderline_id)::text)
                    ->  Hash  (cost=257336.31..257336.31 rows=4074531 width=39) (actual time=11802.455..11802.455 rows=4180592 loops=1)
                          Buckets: 8192  Batches: 128  Memory Usage: 2359kB
                          ->  Seq Scan on c_invoicetax invtax  (cost=0.00..257336.31 rows=4074531 width=39) (actual time=0.003..10083.002 rows=4180592 loops=1)
Total runtime: 514125.669 ms

click here for explain

Best Answer

  1. Work from inside out on the query plan. Anything you see a seq scan on, fix. Look at adding indexes
  2. Use count(). If you're sum(CASE cond THEN 1)ing, then just count(cond)
  3. If you're reducing the count to a bool case, then use a boolean aggregate. rather than count(cond) > 0, just bool_or(cond)
  4. Think about some of your query. What does this even do? I mean on this one you had me thinking he has to be joking... It's not that's it's invalid, it's just crazy complex.

    group by c_invoice_id having SUM(CASE WHEN qtyinvoiced > 0 THEN 1 ELSE 0 END) > 0 and SUM(CASE WHEN qtyinvoiced < 0 THEN 1 ELSE 0 END) > 0) as a
    
    --can be rewitten
    group by c_invoice_id having count(qtyinvoiced > 0) > 0 and count(qtyinvoiced < 0) > 0) as a
    
    --can be 
    group by c_invoice_id having bool_or(qtyinvoiced > 0) and bool_or(qtyinvoiced < 0) as a
    

Also consider using USING (a), rather than ON (l.a = r.a)