PostgreSQL Recursive CTE Index Usage – Why Recursive CTE Does Not Use an Index

postgresqlpostgresql-13

Suppose the following relations:

match(match_id)
event(match_id, seq, gt, …)

There are the following indexes:

match(match_id)
event(match_id, seq)

Further notes:

gt is monotonically increasing
For a given match I have a collection of events which happens at a specific 'gt' time
both match and event are mat views.
List item

I am using postgresql 13.1

My goal is to come up with a RECURSIVE CTE query which calculates the delta between one event and the next one, however I find that very slow.
While this can be solved practically with a self-join, I am not interested in that, I want to find out why my CTE is slow. I believe it should not be that slow.

Further numbers:

number of matches is 400
each match has an average of 541 events

My RECURSIVE CTE query is the following:


WITH RECURSIVE
    delta_gts AS (
        SELECT m.match_id, 1 AS seq, 0 AS gt, 0 AS delta
        FROM matches m

        UNION

        SELECT dgt.match_id, ev.seq AS seq, ev.gt AS gt, (ev.gt - dgt.gt) AS delta
        FROM delta_gts dgt
        JOIN events ev ON ev.match_id = dgt.match_id AND ev.seq = (dgt.seq + 1)

    )

SELECT * FROM delta_gts g

Futher notes I also tried by adding the following (for one match only):

WHERE g.match_id = 'ita_1672780'

and I find out in the plan that there is no predicate pushdown. I think this was implemented in pgsql 13.1

This is the plan:

QUERY PLAN
CTE Scan on delta_gts g  (cost=160601.44..161032.40 rows=21548 width=76) (actual time=173.940..354185.831 rows=220268 loops=1)
"  Buffers: shared hit=5453034 read=596370, temp read=1340253 written=1581611"
  CTE delta_gts
    ->  Recursive Union  (cost=0.00..160601.44 rows=21548 width=76) (actual time=173.931..353944.926 rows=220268 loops=1)
"          Buffers: shared hit=5453034 read=596370, temp read=1340253 written=1580590"
          ->  Seq Scan on netcastingdocument_matches m  (cost=0.00..10.08 rows=408 width=28) (actual time=173.917..174.265 rows=408 loops=1)
                Buffers: shared hit=6
          ->  Hash Join  (cost=14121.22..16016.04 rows=2114 width=76) (actual time=259.550..305.356 rows=190 loops=1158)
                Hash Cond: ((dgt.match_id = ev.match_id) AND ((dgt.seq + 1) = ev.seq))
"                Buffers: shared hit=5453028 read=596370, temp read=1340253 written=1580590"
                ->  WorkTable Scan on delta_gts dgt  (cost=0.00..81.60 rows=4080 width=72) (actual time=0.005..0.067 rows=190 loops=1158)
                ->  Hash  (cost=8106.89..8106.89 rows=288289 width=24) (actual time=257.949..257.949 rows=288323 loops=1158)
                      Buckets: 65536  Batches: 8  Memory Usage: 2484kB
"                      Buffers: shared hit=5453022 read=596370, temp written=1565616"
                      ->  Seq Scan on netcastingdocument_events ev  (cost=0.00..8106.89 rows=288289 width=24) (actual time=0.016..92.171 rows=288323 loops=1158)
                            Buffers: shared hit=5453022 read=596370
Planning:
  Buffers: shared hit=107
Planning Time: 50.290 ms
JIT:
  Functions: 13
"  Options: Inlining false, Optimization false, Expressions true, Deforming true"
"  Timing: Generation 4.108 ms, Inlining 0.000 ms, Optimization 19.158 ms, Emission 154.531 ms, Total 177.796 ms"
Execution Time: 355489.930 ms

Considerations:

It is not using the index (match_id, seq) on the events table at all when the recursive part of the CTE is executed.
Disabling seqscan does the trick as it will use the index for events.

After some investigation it looks like that the issue is that a SeqScan is being performed for looking up the next event which is not right in my situation.

Best Answer

There may be several causes; I cannot be certain, because you didn't post the EXPLAIN (ANALYZE, BUFFERS) output for both executions.

PostgreSQL might mis-estimate the row counts. Running ANALYZE as you did is a good approach here, but in a recursive CTE the row counts are often hard to predict, and it is hard to fix these estimates.

If you don't mind a nasty trick, you could try adding another superfluous join condition to make PostgreSQL think that the result will have fewer rows:
```
JOIN events ev
   ON ev.match_id = dgt.match_id
      AND ev.seq = dgt.seq + 1
      AND ev.seq - 1 = dgt.seq
```
PostgreSQL might price an index scan too high, which induces it to choose a sequential scan and a hash join instead of a nested loop join.
- If you have an SSD as disk, you should lower random_page_cost to 1 or 1.1 to give the PostgreSQL optimizer an idea that index scans are not four times as expensive as sequential scans.
- If you have enough RAM, you should set effective_cache_size high enough, so that PostgreSQL knows that data are likely cached. That will also lower the cost of an index scan.

Best Answer

Related Solutions

Postgresql – How to speed up a Postgres query containing lots of Joins with an ILIKE condition

PostgreSQL Optimization – Explain Plan with Giant Gaps Between Operations

Related Question