Postgresql – How to get performance using PostgreSQL CTE recursive

ctepostgresqltree

I did a tree structure using id and parent_id in the same table. For query's I'm using CTE provide by PostgreSQL, but it's spend so much time to do the joins with recursive results. For example, by the time I have 100 records on sadt_lot table, and this query is spend 8 seconds to return the results. Someone have a better idea to do it?

Example: Listing all sadt's grouped by root sadt_lot's

EXPLAIN ANALYZE WITH RECURSIVE downlots as (
        SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id
        FROM sadt_lot sl1
        WHERE sl1.parent_lot_id IS NULL
    UNION
        SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
        FROM sadt_lot sl2
        INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id
)
SELECT sl.sadt_lot_id, array_agg(s.sadt_id)
FROM sadt_lot sl
LEFT JOIN sadt s ON s.sadt_lot_id = any(SELECT sadt_lot_id FROM downlots WHERE root_id = sl.sadt_lot_id)
WHERE sl.parent_lot_id IS NULL 
group by sl.sadt_lot_id
ORDEr By sl.sadt_lot_id

Query Plan

GroupAggregate  (cost=42.53..15077.74 rows=1 width=36) (actual time=104.090..8436.505 rows=90 loops=1)
  Group Key: sl.sadt_lot_id
  CTE downlots
    ->  Recursive Union  (cost=0.00..42.39 rows=101 width=12) (actual time=0.006..0.104 rows=95 loops=1)
          ->  Seq Scan on sadt_lot sl1  (cost=0.00..2.94 rows=1 width=12) (actual time=0.005..0.019 rows=90 loops=1)
                Filter: (parent_lot_id IS NULL)
                Rows Removed by Filter: 5
          ->  Hash Join  (cost=0.33..3.74 rows=10 width=12) (actual time=0.027..0.028 rows=2 loops=2)
                Hash Cond: (sl2.parent_lot_id = d.sadt_lot_id)
                ->  Seq Scan on sadt_lot sl2  (cost=0.00..2.94 rows=94 width=8) (actual time=0.002..0.008 rows=95 loops=2)
                ->  Hash  (cost=0.20..0.20 rows=10 width=8) (actual time=0.010..0.010 rows=48 loops=2)
                      Buckets: 1024  Batches: 1  Memory Usage: 9kB
                      ->  WorkTable Scan on downlots d  (cost=0.00..0.20 rows=10 width=8) (actual time=0.001..0.004 rows=48 loops=2)
  ->  Nested Loop Left Join  (cost=0.14..15004.14 rows=6242 width=8) (actual time=8.234..8434.229 rows=11345 loops=1)
        Join Filter: (SubPlan 2)
        Rows Removed by Join Filter: 1112125
        ->  Index Only Scan using sadt_lot_sadt_lot_id_parent_lot_id_idx on sadt_lot sl  (cost=0.14..12.86 rows=1 width=4) (actual time=0.011..0.252 rows=90 loops=1)
              Index Cond: (parent_lot_id IS NULL)
              Heap Fetches: 90
        ->  Seq Scan on sadt s  (cost=0.00..635.83 rows=12483 width=8) (actual time=0.002..1.785 rows=12483 loops=90)
        SubPlan 2
          ->  CTE Scan on downlots  (cost=0.00..2.27 rows=1 width=4) (actual time=0.003..0.007 rows=1 loops=1123470)
                Filter: (root_id = sl.sadt_lot_id)
                Rows Removed by Filter: 94
Planning time: 0.203 ms
Execution time: 8436.598 ms

Best Answer

I found the solution. I was using the recursive expression how parameter to join and it was do several lops on the table used on join, a better aprouch is before join with this table(sadt), do the join with recursive expression(downlots "table") and after, using result, do join with the sadt, with that, the query jump from 8sec to 8ms. Follow the solution:

EXPLAIN ANALYZE SELECT sl.sadt_lot_id, array_agg(s.sadt_id)
FROM sadt_lot sl
LEFT JOIN (WITH RECURSIVE downlots as (
        SELECT sl1.sadt_lot_id, 0 AS level, sl1.sadt_lot_id as root_id
        FROM sadt_lot sl1
        WHERE sl1.parent_lot_id IS NULL
    UNION
        SELECT sl2.sadt_lot_id, d.level + 1, d.sadt_lot_id as root_id
        FROM sadt_lot sl2
        INNER JOIN downlots d ON d.sadt_lot_id = sl2.parent_lot_id
)SELECT * FROM downlots) d ON d.sadt_lot_id = sl.sadt_lot_id
LEFT JOIN sadt s ON s.sadt_lot_id = d.root_id
WHERE sl.parent_lot_id IS NULL 
group by sl.sadt_lot_id
ORDEr By sl.sadt_lot_id

Query Plan

Sort  (cost=1935.35..1935.56 rows=82 width=36) (actual time=8.230..8.234 rows=82 loops=1)
Sort Key: sl.sadt_lot_id
Sort Method: quicksort  Memory: 75kB
->  HashAggregate  (cost=1931.72..1932.74 rows=82 width=36) (actual time=8.085..8.197 rows=82 loops=1)
    Group Key: sl.sadt_lot_id
    ->  Hash Right Join  (cost=469.73..1839.25 rows=18493 width=8) (actual time=0.328..6.273 rows=10742 loops=1)
          Hash Cond: (s.sadt_lot_id = downlots.root_id)
          ->  Seq Scan on sadt s  (cost=0.00..645.78 rows=12678 width=8) (actual time=0.007..1.406 rows=12493 loops=1)
          ->  Hash  (cost=465.72..465.72 rows=321 width=8) (actual time=0.242..0.242 rows=82 loops=1)
                Buckets: 1024  Batches: 1  Memory Usage: 12kB
                ->  Hash Right Join  (cost=432.42..465.72 rows=321 width=8) (actual time=0.049..0.232 rows=82 loops=1)
                      Hash Cond: (downlots.sadt_lot_id = sl.sadt_lot_id)
                      ->  CTE Scan on downlots  (cost=428.41..444.05 rows=782 width=12) (actual time=0.007..0.167 rows=96 loops=1)
                            CTE downlots
                              ->  Recursive Union  (cost=0.00..428.41 rows=782 width=12) (actual time=0.006..0.143 rows=96 loops=1)
                                    ->  Seq Scan on sadt_lot sl1  (cost=0.00..2.99 rows=82 width=12) (actual time=0.004..0.018 rows=82 loops=1)
                                          Filter: (parent_lot_id IS NULL)
                                          Rows Removed by Filter: 14
                                    ->  Hash Join  (cost=4.23..40.98 rows=70 width=12) (actual time=0.030..0.031 rows=5 loops=3)
                                          Hash Cond: (d.sadt_lot_id = sl2.parent_lot_id)
                                          ->  WorkTable Scan on downlots d  (cost=0.00..16.40 rows=820 width=8) (actual time=0.000..0.002 rows=32 loops=3)
                                          ->  Hash  (cost=2.99..2.99 rows=99 width=8) (actual time=0.069..0.069 rows=14 loops=1)
                                                Buckets: 1024  Batches: 1  Memory Usage: 9kB
                                                ->  Seq Scan on sadt_lot sl2  (cost=0.00..2.99 rows=99 width=8) (actual time=0.004..0.061 rows=96 loops=1)
                      ->  Hash  (cost=2.99..2.99 rows=82 width=4) (actual time=0.039..0.039 rows=82 loops=1)
                            Buckets: 1024  Batches: 1  Memory Usage: 11kB
                            ->  Seq Scan on sadt_lot sl  (cost=0.00..2.99 rows=82 width=4) (actual time=0.014..0.028 rows=82 loops=1)
                                  Filter: (parent_lot_id IS NULL)
                                  Rows Removed by Filter: 14
Planning time: 0.225 ms
Execution time: 8.300 ms