Context
I'm tuning a bulk UPDATE
which selects from another (large) table. My intention is to provide a covering index to support an index only scan of the source table. I realise the source table must be vacuumed to update its visibility map.
My investigations so far suggest the optimiser elects to index only scan the source table when the UPDATE
uses a correlated subquery, but appears to use a standard index scan when a join is used (UPDATE...FROM
). I'm asking this question to understand why.
I provide a simplified example here to illustrate the differences.
I'm using Postgres 9.6.8, but get very similar plans for 10.11 and 11.6. I have reproduced the plans on a vanilla 9.6 Postgres installation in Docker using the official image, and also on db<>fiddle here.
Setup
CREATE TABLE lookup (
surrogate_key BIGINT PRIMARY KEY,
natural_key TEXT NOT NULL UNIQUE,
data TEXT NOT NULL);
INSERT INTO lookup
SELECT id, 'nk'||id, random()::text
FROM generate_series(1,400000) id;
CREATE UNIQUE INDEX lookup_ix ON lookup(natural_key, surrogate_key);
VACUUM ANALYSE lookup;
CREATE TABLE target (
target_id BIGINT PRIMARY KEY,
lookup_natural_key TEXT NOT NULL,
lookup_surrogate_key BIGINT,
data TEXT NOT NULL
);
INSERT INTO target (target_id, lookup_natural_key, data)
SELECT id+1000, 'nk'||id, random()::text
FROM generate_series(1,1000) id;
ANALYSE target;
UPDATE using join
EXPLAIN (ANALYSE, VERBOSE, BUFFERS)
UPDATE target
SET lookup_surrogate_key = surrogate_key
FROM lookup
WHERE lookup_natural_key = natural_key;
Standard index scan on lookup_ix
– so heap blocks are read from lookup
table:
Update on public.target (cost=0.42..7109.00 rows=1000 width=54) (actual time=76.688..76.688 rows=0 loops=1)
Buffers: shared hit=8514 read=550 dirtied=16
-> Nested Loop (cost=0.42..7109.00 rows=1000 width=54) (actual time=0.050..62.493 rows=1000 loops=1)
Output: target.target_id, target.lookup_natural_key, lookup.surrogate_key, target.data, target.ctid, lookup.ctid
Buffers: shared hit=3479 read=535
-> Seq Scan on public.target (cost=0.00..19.00 rows=1000 width=40) (actual time=0.013..7.691 rows=1000 loops=1)
Output: target.target_id, target.lookup_natural_key, target.data, target.ctid
Buffers: shared hit=9
-> Index Scan using lookup_ix on public.lookup (cost=0.42..7.08 rows=1 width=22) (actual time=0.020..0.027 rows=1 loops=1000)
Output: lookup.surrogate_key, lookup.ctid, lookup.natural_key
Index Cond: (lookup.natural_key = target.lookup_natural_key)
Buffers: shared hit=3470 read=535
Planning time: 0.431 ms
Execution time: 76.826 ms
UPDATE using correlated subquery
EXPLAIN (ANALYSE, VERBOSE, BUFFERS)
UPDATE target
SET lookup_surrogate_key = (
SELECT surrogate_key
FROM lookup
WHERE lookup_natural_key = natural_key);
Index only scan on lookup_ix
as intended:
Update on public.target (cost=0.00..4459.00 rows=1000 width=47) (actual time=52.947..52.947 rows=0 loops=1)
Buffers: shared hit=8050 read=15 dirtied=16
-> Seq Scan on public.target (cost=0.00..4459.00 rows=1000 width=47) (actual time=0.052..40.306 rows=1000 loops=1)
Output: target.target_id, target.lookup_natural_key, (SubPlan 1), target.data, target.ctid
Buffers: shared hit=3015
SubPlan 1
-> Index Only Scan using lookup_ix on public.lookup (cost=0.42..4.44 rows=1 width=8) (actual time=0.013..0.019 rows=1 loops=1000)
Output: lookup.surrogate_key
Index Cond: (lookup.natural_key = target.lookup_natural_key)
Heap Fetches: 0
Buffers: shared hit=3006
Planning time: 0.130 ms
Execution time: 52.987 ms
db<>fiddle here
I understand that the queries are not logically identical (different behaviour when there a no/multiple rows in lookup
for a given natural_key
), but I'm surprised by the different usage of lookup_ix
.
Can anyone explain why the join version could not use an index only scan please?
Best Answer
An index only scan can only be used if all columns that the index scan returns are stored in the index.
Yours index
lookup_ix
contains only the columnsnatural_key
andsurrogate_key
, but from the execution plan you can see:So the index scan fetches the
ctid
the physical address of the row. This is because we are going to perform anUPDATE
; see this comment fromsrc\include/nodes/plannodes.h
:(
EvalPlanQual
is PostgreSQL a jargon term/function name for fetching the latest version of a row that will be locked or modified.)To ascertain that this is the case, try
EXPLAIN
ing the same query without the update, and you will see that the plan is the same, except thectid
s are not fetched, and it uses an index only scan.Now there is one thing that is surprising to me:
Every (leaf) index entry does contain the
ctid
of the row it references, so I don't see a reason why that cannot be fetched with an index only scan. I asked on the mailing list, but it seems that changing this might not be trivial.