Suppose the following relations:
- match(match_id)
- event(match_id, seq, gt, …)
There are the following indexes:
- match(match_id)
- event(match_id, seq)
Further notes:
- gt is monotonically increasing
- For a given match I have a collection of events which happens at a specific 'gt' time
- both match and event are mat views.
- List item
I am using postgresql 13.1
My goal is to come up with a RECURSIVE CTE query which calculates the delta between one event and the next one, however I find that very slow.
While this can be solved practically with a self-join, I am not interested in that, I want to find out why my CTE is slow. I believe it should not be that slow.
Further numbers:
- number of matches is 400
- each match has an average of 541 events
My RECURSIVE CTE query is the following:
WITH RECURSIVE
delta_gts AS (
SELECT m.match_id, 1 AS seq, 0 AS gt, 0 AS delta
FROM matches m
UNION
SELECT dgt.match_id, ev.seq AS seq, ev.gt AS gt, (ev.gt - dgt.gt) AS delta
FROM delta_gts dgt
JOIN events ev ON ev.match_id = dgt.match_id AND ev.seq = (dgt.seq + 1)
)
SELECT * FROM delta_gts g
Futher notes I also tried by adding the following (for one match only):
WHERE g.match_id = 'ita_1672780'
and I find out in the plan that there is no predicate pushdown. I think this was implemented in pgsql 13.1
This is the plan:
QUERY PLAN
CTE Scan on delta_gts g (cost=160601.44..161032.40 rows=21548 width=76) (actual time=173.940..354185.831 rows=220268 loops=1)
" Buffers: shared hit=5453034 read=596370, temp read=1340253 written=1581611"
CTE delta_gts
-> Recursive Union (cost=0.00..160601.44 rows=21548 width=76) (actual time=173.931..353944.926 rows=220268 loops=1)
" Buffers: shared hit=5453034 read=596370, temp read=1340253 written=1580590"
-> Seq Scan on netcastingdocument_matches m (cost=0.00..10.08 rows=408 width=28) (actual time=173.917..174.265 rows=408 loops=1)
Buffers: shared hit=6
-> Hash Join (cost=14121.22..16016.04 rows=2114 width=76) (actual time=259.550..305.356 rows=190 loops=1158)
Hash Cond: ((dgt.match_id = ev.match_id) AND ((dgt.seq + 1) = ev.seq))
" Buffers: shared hit=5453028 read=596370, temp read=1340253 written=1580590"
-> WorkTable Scan on delta_gts dgt (cost=0.00..81.60 rows=4080 width=72) (actual time=0.005..0.067 rows=190 loops=1158)
-> Hash (cost=8106.89..8106.89 rows=288289 width=24) (actual time=257.949..257.949 rows=288323 loops=1158)
Buckets: 65536 Batches: 8 Memory Usage: 2484kB
" Buffers: shared hit=5453022 read=596370, temp written=1565616"
-> Seq Scan on netcastingdocument_events ev (cost=0.00..8106.89 rows=288289 width=24) (actual time=0.016..92.171 rows=288323 loops=1158)
Buffers: shared hit=5453022 read=596370
Planning:
Buffers: shared hit=107
Planning Time: 50.290 ms
JIT:
Functions: 13
" Options: Inlining false, Optimization false, Expressions true, Deforming true"
" Timing: Generation 4.108 ms, Inlining 0.000 ms, Optimization 19.158 ms, Emission 154.531 ms, Total 177.796 ms"
Execution Time: 355489.930 ms
Considerations:
- It is not using the index (match_id, seq) on the events table at all when the recursive part of the CTE is executed.
- Disabling seqscan does the trick as it will use the index for events.
After some investigation it looks like that the issue is that a SeqScan is being performed for looking up the next event which is not right in my situation.
Best Answer
There may be several causes; I cannot be certain, because you didn't post the
EXPLAIN (ANALYZE, BUFFERS)
output for both executions.PostgreSQL might mis-estimate the row counts. Running
ANALYZE
as you did is a good approach here, but in a recursive CTE the row counts are often hard to predict, and it is hard to fix these estimates.If you don't mind a nasty trick, you could try adding another superfluous join condition to make PostgreSQL think that the result will have fewer rows:
PostgreSQL might price an index scan too high, which induces it to choose a sequential scan and a hash join instead of a nested loop join.
If you have an SSD as disk, you should lower
random_page_cost
to 1 or 1.1 to give the PostgreSQL optimizer an idea that index scans are not four times as expensive as sequential scans.If you have enough RAM, you should set
effective_cache_size
high enough, so that PostgreSQL knows that data are likely cached. That will also lower the cost of an index scan.