Postgresql – Extremely long query time after upgrading to Postgres 9.3 from 9.1

performancepostgresqlpostgresql-9.3query-performance

We're attempting to upgrade from Postgres 9.1 to Postgres 9.3 on a FreeBSD 9.2 zfs box. Performance is noticeably slower across the board, but we have one query that has gone from 353ms to >200000ms. We did a pg_dump and pg_restore so the data is the same. The query plans are the same. We can't figure out what is causing the time difference.

Query:

WITH RECURSIVE sub_products (
    sub_product_no
    ,product_no
    ,p_type
    ,p_location
    ,p2_location
    ,path
    ,CYCLE
    )
AS (
    SELECT p2.product_no sub_product
        ,p.product_no
        ,p.type
        ,p.location_no p_location
        ,p2.location_no p2_location
        ,ARRAY [p.product_no]
        ,FALSE
    FROM product p
    INNER JOIN section s ON s.product_no = p.product_no
    INNER JOIN packet_document pd ON pd.section_no = s.section_no
    INNER JOIN product p2 ON p2.doc_id = pd.doc_id
        AND (
            p2.location_no = p.location_no
            OR p.location_no = 0
            )
        AND CASE 
            WHEN p.document_type = 'MPCKT'
                THEN date_trunc('month', p.product_ts) = date_trunc('month', p2.product_ts)
            WHEN p.document_type = 'DPCKT'
                THEN p.product_ts::DATE = p2.product_ts::DATE
            WHEN p.document_type = 'WPCKT'
                THEN date_trunc('week', p.product_ts) = date_trunc('week', p2.product_ts)
            WHEN p.document_type = 'PPCKT'
                THEN p2.section_no = pd.section_no
            WHEN p.document_type = 'CPCKT'
                THEN p2.complete != 'true'
            END
        AND (
            p2.product_st != 'del'
            OR p2.product_st IS NULL
            )
    WHERE p.product_no = 37202

    UNION ALL

    SELECT p2.product_no sub_product
        ,p.product_no
        ,p2.type
        ,p.location_no p_location
        ,p2.location_no p2_location
        ,path || p.product_no
        ,p.product_no = ANY (path)
    FROM product p
    INNER JOIN section s ON s.product_no = p.product_no
    INNER JOIN packet_document pd ON pd.section_no = s.section_no
    INNER JOIN product p2 ON p2.doc_id = pd.doc_id
        AND CASE 
            WHEN p.document_type = 'MPCKT'
                THEN date_trunc('month', p.product_ts) = date_trunc('month', p2.product_ts)
            WHEN p.document_type = 'DPCKT'
                THEN p.product_ts::DATE = p2.product_ts::DATE
            WHEN p.document_type = 'WPCKT'
                THEN date_trunc('week', p.product_ts) = date_trunc('week', p2.product_ts)
            WHEN p.document_type = 'PPCKT'
                THEN p2.section_no = pd.section_no
            WHEN p.document_type = 'CPCKT'
                THEN p2.complete != 'true'
            END
        AND (
            p2.product_st != 'del'
            OR p2.product_st IS NULL
            )
        ,sub_products
    WHERE p.product_no = sub_products.sub_product_no
        AND NOT CYCLE
        AND p_type = 'PCKT'
    )
SELECT op.*
    ,sp.path
    ,sp.p_location
    ,sp.p2_location
FROM sub_products sp
INNER JOIN open_points op ON (
        op.product_no = sp.product_no
        OR op.product_no = sp.sub_product_no
        )
WHERE p_location = 0
    OR p_location = p2_location

And the explain analyze (9.3): http://explain.depesz.com/s/mnZv

"Nested Loop  (cost=15583541.40..24502315.78 rows=3645458 width=173) (actual time=202496.394..202496.394 rows=0 loops=1)"
"  CTE sub_products"
"    ->  Recursive Union  (cost=12002.21..15583540.83 rows=125001731 width=53) (actual time=46.197..202464.751 rows=10272 loops=1)"
"          ->  Hash Join  (cost=12002.21..15099.20 rows=1151 width=21) (actual time=46.194..73.404 rows=12 loops=1)"
"                Hash Cond: (p2.doc_id = pd.doc_id)"
"                Join Filter: CASE WHEN ((p.document_type)::text = 'MPCKT'::text) THEN (date_trunc('month'::text, p.product_ts) = date_trunc('month'::text, p2.product_ts)) WHEN ((p.document_type)::text = 'DPCKT'::text) THEN ((p.product_ts)::date = (p2.produ (...)"
"                Rows Removed by Join Filter: 142"
"                ->  Nested Loop  (cost=0.29..2882.07 rows=10592 width=50) (actual time=0.038..64.013 rows=46461 loops=1)"
"                      Join Filter: ((p2.location_no = p.location_no) OR (p.location_no = 0))"
"                      ->  Index Scan using product_pkey on product p  (cost=0.29..8.31 rows=1 width=25) (actual time=0.017..0.018 rows=1 loops=1)"
"                            Index Cond: (product_no = 37202)"
"                      ->  Seq Scan on product p2  (cost=0.00..2166.99 rows=47118 width=25) (actual time=0.009..26.714 rows=46461 loops=1)"
"                            Filter: (((product_st)::text <> 'del'::text) OR (product_st IS NULL))"
"                            Rows Removed by Filter: 374"
"                ->  Hash  (cost=11998.23..11998.23 rows=295 width=12) (actual time=0.065..0.065 rows=11 loops=1)"
"                      Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"                      ->  Nested Loop  (cost=0.86..11998.23 rows=295 width=12) (actual time=0.032..0.055 rows=11 loops=1)"
"                            ->  Index Scan using x_section_product_no on section s  (cost=0.43..51.95 rows=200 width=8) (actual time=0.016..0.018 rows=3 loops=1)"
"                                  Index Cond: (product_no = 37202)"
"                            ->  Index Scan using x_packet_document_section_no on packet_document pd  (cost=0.43..59.55 rows=18 width=8) (actual time=0.007..0.010 rows=4 loops=3)"
"                                  Index Cond: (section_no = s.section_no)"
"          ->  Merge Join  (cost=1066370.48..1306840.70 rows=12500058 width=53) (actual time=99935.009..101178.728 rows=5130 loops=2)"
"                Merge Cond: ((pd_1.doc_id = p2_1.doc_id) AND (s_1.product_no = p_1.product_no))"
"                Join Filter: CASE WHEN ((p_1.document_type)::text = 'MPCKT'::text) THEN (date_trunc('month'::text, p_1.product_ts) = date_trunc('month'::text, p2_1.product_ts)) WHEN ((p_1.document_type)::text = 'DPCKT'::text) THEN ((p_1.product_ts)::date = (...)"
"                Rows Removed by Join Filter: 59256"
"                ->  Sort  (cost=907378.84..919925.28 rows=5018577 width=12) (actual time=14320.515..15197.544 rows=5013201 loops=2)"
"                      Sort Key: pd_1.doc_id, s_1.product_no"
"                      Sort Method: quicksort  Memory: 409757kB"
"                      ->  Hash Join  (cost=118543.85..348840.16 rows=5018577 width=12) (actual time=1637.833..10102.077 rows=5013205 loops=2)"
"                            Hash Cond: (pd_1.section_no = s_1.section_no)"
"                            ->  Seq Scan on packet_document pd_1  (cost=0.00..129924.77 rows=5018577 width=8) (actual time=0.008..1084.765 rows=5016548 loops=2)"
"                            ->  Hash  (cost=75967.82..75967.82 rows=3406082 width=8) (actual time=3268.947..3268.947 rows=3405911 loops=1)"
"                                  Buckets: 524288  Batches: 1  Memory Usage: 133044kB"
"                                  ->  Seq Scan on section s_1  (cost=0.00..75967.82 rows=3406082 width=8) (actual time=0.006..1352.507 rows=3405911 loops=1)"
"                ->  Sort  (cost=158991.46..162407.52 rows=1366422 width=86) (actual time=73496.575..81944.111 rows=9896192 loops=2)"
"                      Sort Key: p2_1.doc_id, p_1.product_no"
"                      Sort Method: external sort  Disk: 2068272kB"
"                      ->  Nested Loop  (cost=0.29..19739.59 rows=1366422 width=86) (actual time=0.309..5643.939 rows=9896193 loops=2)"
"                            ->  Seq Scan on product p2_1  (cost=0.00..2166.99 rows=47118 width=30) (actual time=0.019..58.521 rows=46461 loops=2)"
"                                  Filter: (((product_st)::text <> 'del'::text) OR (product_st IS NULL))"
"                                  Rows Removed by Filter: 374"
"                            ->  Materialize  (cost=0.29..492.40 rows=29 width=56) (actual time=0.000..0.033 rows=213 loops=92922)"
"                                  ->  Nested Loop  (cost=0.29..492.25 rows=29 width=56) (actual time=0.282..1.258 rows=213 loops=2)"
"                                        ->  WorkTable Scan on sub_products  (cost=0.00..258.97 rows=29 width=36) (actual time=0.240..0.771 rows=213 loops=2)"
"                                              Filter: ((NOT cycle) AND ((p_type)::text = 'PCKT'::text))"
"                                              Rows Removed by Filter: 4923"
"                                        ->  Index Scan using product_pkey on product p_1  (cost=0.29..8.03 rows=1 width=20) (actual time=0.002..0.002 rows=1 loops=426)"
"                                              Index Cond: (product_no = sub_products.sub_product_no)"
"  ->  CTE Scan on sub_products sp  (cost=0.00..3125043.28 rows=1246892 width=48) (actual time=46.205..202482.836 rows=1144 loops=1)"
"        Filter: ((p_location = 0) OR (p_location = p2_location))"
"        Rows Removed by Filter: 9128"
"  ->  Bitmap Heap Scan on open_points op  (cost=0.57..4.62 rows=3 width=133) (actual time=0.007..0.007 rows=0 loops=1144)"
"        Recheck Cond: ((product_no = sp.product_no) OR (product_no = sp.sub_product_no))"
"        ->  BitmapOr  (cost=0.57..0.57 rows=3 width=0) (actual time=0.006..0.006 rows=0 loops=1144)"
"              ->  Bitmap Index Scan on open_point_product_no  (cost=0.00..0.29 rows=1 width=0) (actual time=0.003..0.003 rows=0 loops=1144)"
"                    Index Cond: (product_no = sp.product_no)"
"              ->  Bitmap Index Scan on open_point_product_no  (cost=0.00..0.29 rows=1 width=0) (actual time=0.002..0.002 rows=0 loops=1144)"
"                    Index Cond: (product_no = sp.sub_product_no)"
"Total runtime: 202742.579 ms"

Explain Anaylyze (9.1): http://explain.depesz.com/s/wT4

"Nested Loop  (cost=20654165.25..29843606.20 rows=3785336 width=173) (actual time=2551.031..2551.031 rows=0 loops=1)"
"  CTE sub_products"
"    ->  Recursive Union  (cost=12076.64..20654164.73 rows=129798066 width=53) (actual time=19.290..2535.724 rows=10371 loops=1)"
"          ->  Hash Join  (cost=12076.64..15192.11 rows=1226 width=21) (actual time=19.288..29.770 rows=12 loops=1)"
"                Hash Cond: (p2.doc_id = pd.doc_id)"
"                Join Filter: CASE WHEN ((p.document_type)::text = 'MPCKT'::text) THEN (date_trunc('month'::text, p.product_ts) = date_trunc('month'::text, p2.product_ts)) WHEN ((p.document_type)::text = 'DPCKT'::text) THEN ((p.product_ts)::date = (p2.produ (...)"
"                ->  Nested Loop  (cost=0.00..2900.90 rows=10419 width=49) (actual time=0.019..25.936 rows=45955 loops=1)"
"                      Join Filter: ((p2.location_no = p.location_no) OR (p.location_no = 0))"
"                      ->  Index Scan using product_pkey on product p  (cost=0.00..8.27 rows=1 width=25) (actual time=0.008..0.010 rows=1 loops=1)"
"                            Index Cond: (product_no = 37202)"
"                      ->  Seq Scan on product p2  (cost=0.00..2199.10 rows=46235 width=24) (actual time=0.005..13.207 rows=45955 loops=1)"
"                            Filter: (((product_st)::text <> 'del'::text) OR (product_st IS NULL))"
"                ->  Hash  (cost=12073.05..12073.05 rows=287 width=12) (actual time=0.032..0.032 rows=11 loops=1)"
"                      Buckets: 1024  Batches: 1  Memory Usage: 1kB"
"                      ->  Nested Loop  (cost=0.00..12073.05 rows=287 width=12) (actual time=0.017..0.026 rows=11 loops=1)"
"                            ->  Index Scan using x_section_product_no on section s  (cost=0.00..52.36 rows=194 width=8) (actual time=0.009..0.011 rows=3 loops=1)"
"                                  Index Cond: (product_no = 37202)"
"                            ->  Index Scan using x_packet_document_section_no on packet_document pd  (cost=0.00..61.72 rows=19 width=8) (actual time=0.004..0.005 rows=4 loops=3)"
"                                  Index Cond: (section_no = s.section_no)"
"          ->  Merge Join  (cost=311405.90..1804301.13 rows=12979684 width=53) (actual time=1198.546..1250.898 rows=5180 loops=2)"
"                Merge Cond: (p2.doc_id = pd.doc_id)"
"                Join Filter: CASE WHEN ((p.document_type)::text = 'MPCKT'::text) THEN (date_trunc('month'::text, p.product_ts) = date_trunc('month'::text, p2.product_ts)) WHEN ((p.document_type)::text = 'DPCKT'::text) THEN ((p.product_ts)::date = (p2.produ (...)"
"                ->  Sort  (cost=5781.55..5897.14 rows=46235 width=29) (actual time=33.074..38.250 rows=43512 loops=2)"
"                      Sort Key: p2.doc_id"
"                      Sort Method: quicksort  Memory: 5127kB"
"                      ->  Seq Scan on product p2  (cost=0.00..2199.10 rows=46235 width=29) (actual time=0.006..14.607 rows=45955 loops=2)"
"                            Filter: (((product_st)::text <> 'del'::text) OR (product_st IS NULL))"
"                ->  Sort  (cost=305624.35..307335.61 rows=684504 width=60) (actual time=1156.466..1159.650 rows=63702 loops=2)"
"                      Sort Key: pd.doc_id"
"                      Sort Method: quicksort  Memory: 231kB"
"                      ->  Hash Join  (cost=61697.02..239279.83 rows=684504 width=60) (actual time=216.507..1155.788 rows=1040 loops=2)"
"                            Hash Cond: (pd.section_no = s.section_no)"
"                            ->  Seq Scan on packet_document pd  (cost=0.00..152189.92 rows=4946092 width=8) (actual time=0.004..404.135 rows=4994133 loops=2)"
"                            ->  Hash  (cost=55916.76..55916.76 rows=462421 width=56) (actual time=5.258..5.258 rows=400 loops=2)"
"                                  Buckets: 65536  Batches: 1  Memory Usage: 36kB"
"                                  ->  Nested Loop  (cost=0.00..55916.76 rows=462421 width=56) (actual time=0.271..5.090 rows=400 loops=2)"
"                                        ->  Nested Loop  (cost=0.00..524.76 rows=31 width=56) (actual time=0.251..4.392 rows=213 loops=2)"
"                                              ->  WorkTable Scan on sub_products  (cost=0.00..275.85 rows=31 width=36) (actual time=0.228..0.758 rows=213 loops=2)"
"                                                    Filter: ((NOT cycle) AND ((p_type)::text = 'PCKT'::text))"
"                                              ->  Index Scan using product_pkey on product p  (cost=0.00..8.02 rows=1 width=20) (actual time=0.016..0.017 rows=1 loops=426)"
"                                                    Index Cond: (product_no = sub_products.sub_product_no)"
"                                        ->  Index Scan using x_section_product_no on section s  (cost=0.00..1600.38 rows=14917 width=8) (actual time=0.002..0.003 rows=2 loops=426)"
"                                              Index Cond: (product_no = p.product_no)"
"  ->  CTE Scan on sub_products sp  (cost=0.00..3244951.65 rows=1294736 width=48) (actual time=19.296..2545.674 rows=1155 loops=1)"
"        Filter: ((p_location = 0) OR (p_location = p2_location))"
"  ->  Bitmap Heap Scan on open_points op  (cost=0.52..4.55 rows=3 width=133) (actual time=0.003..0.003 rows=0 loops=1155)"
"        Recheck Cond: ((product_no = sp.product_no) OR (product_no = sp.sub_product_no))"
"        ->  BitmapOr  (cost=0.52..0.52 rows=3 width=0) (actual time=0.003..0.003 rows=0 loops=1155)"
"              ->  Bitmap Index Scan on open_point_product_no  (cost=0.00..0.26 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1155)"
"                    Index Cond: (product_no = sp.product_no)"
"              ->  Bitmap Index Scan on open_point_product_no  (cost=0.00..0.26 rows=1 width=0) (actual time=0.001..0.001 rows=0 loops=1155)"
"                    Index Cond: (product_no = sp.sub_product_no)"
"Total runtime: 2553.328 ms"

Best Answer

The key difference is in 9.3:

"                ->  Sort  (cost=158991.46..162407.52 rows=1366422 width=86) (actual time=73496.575..81944.111 rows=9896192 loops=2)"
"                      Sort Key: p2_1.doc_id, p_1.product_no"
"                      Sort Method: external sort  Disk: 2068272kB"

vs in 9.1:

"                ->  Sort  (cost=305624.35..307335.61 rows=684504 width=60) (actual time=1156.466..1159.650 rows=63702 loops=2)"
"                      Sort Key: pd.doc_id"
"                      Sort Method: quicksort  Memory: 231kB"

probably because 9.3 has underestimated the number of rows output by the nodes underneath by a factor of 7, while 9.1 has overestimated them by a factor of 10.

Consider bumping work_mem up a bunch; throw say 10MB of work_mem at the query.