PostgreSQL – Update with Data from Another Table Using Index Only Scan

postgresqlpostgresql-9.6postgresql-performanceupdate

Context

I'm tuning a bulk UPDATE which selects from another (large) table. My intention is to provide a covering index to support an index only scan of the source table. I realise the source table must be vacuumed to update its visibility map.

My investigations so far suggest the optimiser elects to index only scan the source table when the UPDATE uses a correlated subquery, but appears to use a standard index scan when a join is used (UPDATE...FROM). I'm asking this question to understand why.

I provide a simplified example here to illustrate the differences.

I'm using Postgres 9.6.8, but get very similar plans for 10.11 and 11.6. I have reproduced the plans on a vanilla 9.6 Postgres installation in Docker using the official image, and also on db<>fiddle here.

Setup

CREATE TABLE lookup (
    surrogate_key   BIGINT PRIMARY KEY,
    natural_key     TEXT NOT NULL UNIQUE,
    data            TEXT NOT NULL);

INSERT INTO lookup
SELECT id, 'nk'||id, random()::text
FROM generate_series(1,400000) id;

CREATE UNIQUE INDEX lookup_ix ON lookup(natural_key, surrogate_key);

VACUUM ANALYSE lookup;

CREATE TABLE target (
    target_id               BIGINT PRIMARY KEY,
    lookup_natural_key      TEXT NOT NULL,
    lookup_surrogate_key    BIGINT,
    data                    TEXT NOT NULL
);

INSERT INTO target (target_id, lookup_natural_key, data)
SELECT id+1000, 'nk'||id, random()::text
FROM generate_series(1,1000) id;

ANALYSE target;

UPDATE using join

EXPLAIN (ANALYSE, VERBOSE, BUFFERS)
UPDATE target
SET lookup_surrogate_key = surrogate_key
FROM lookup
WHERE lookup_natural_key = natural_key;

Standard index scan on lookup_ix – so heap blocks are read from lookup table:

Update on public.target  (cost=0.42..7109.00 rows=1000 width=54) (actual time=76.688..76.688 rows=0 loops=1)
  Buffers: shared hit=8514 read=550 dirtied=16
  ->  Nested Loop  (cost=0.42..7109.00 rows=1000 width=54) (actual time=0.050..62.493 rows=1000 loops=1)
        Output: target.target_id, target.lookup_natural_key, lookup.surrogate_key, target.data, target.ctid, lookup.ctid
        Buffers: shared hit=3479 read=535
        ->  Seq Scan on public.target  (cost=0.00..19.00 rows=1000 width=40) (actual time=0.013..7.691 rows=1000 loops=1)
              Output: target.target_id, target.lookup_natural_key, target.data, target.ctid
              Buffers: shared hit=9
        ->  Index Scan using lookup_ix on public.lookup  (cost=0.42..7.08 rows=1 width=22) (actual time=0.020..0.027 rows=1 loops=1000)
              Output: lookup.surrogate_key, lookup.ctid, lookup.natural_key
              Index Cond: (lookup.natural_key = target.lookup_natural_key)
              Buffers: shared hit=3470 read=535
Planning time: 0.431 ms
Execution time: 76.826 ms

UPDATE using correlated subquery

EXPLAIN (ANALYSE, VERBOSE, BUFFERS)
UPDATE target
SET lookup_surrogate_key = (
    SELECT surrogate_key
    FROM lookup
    WHERE lookup_natural_key = natural_key);

Index only scan on lookup_ix as intended:

Update on public.target  (cost=0.00..4459.00 rows=1000 width=47) (actual time=52.947..52.947 rows=0 loops=1)
  Buffers: shared hit=8050 read=15 dirtied=16
  ->  Seq Scan on public.target  (cost=0.00..4459.00 rows=1000 width=47) (actual time=0.052..40.306 rows=1000 loops=1)
        Output: target.target_id, target.lookup_natural_key, (SubPlan 1), target.data, target.ctid
        Buffers: shared hit=3015
        SubPlan 1
          ->  Index Only Scan using lookup_ix on public.lookup  (cost=0.42..4.44 rows=1 width=8) (actual time=0.013..0.019 rows=1 loops=1000)
                Output: lookup.surrogate_key
                Index Cond: (lookup.natural_key = target.lookup_natural_key)
                Heap Fetches: 0
                Buffers: shared hit=3006
Planning time: 0.130 ms
Execution time: 52.987 ms

db<>fiddle here

I understand that the queries are not logically identical (different behaviour when there a no/multiple rows in lookup for a given natural_key), but I'm surprised by the different usage of lookup_ix.

Can anyone explain why the join version could not use an index only scan please?

Best Answer

An index only scan can only be used if all columns that the index scan returns are stored in the index.

Yours index lookup_ix contains only the columns natural_key and surrogate_key, but from the execution plan you can see:

->  Index Scan using lookup_ix on public.lookup
      Output: lookup.surrogate_key, lookup.ctid, lookup.natural_key

So the index scan fetches the ctid the physical address of the row. This is because we are going to perform an UPDATE; see this comment from src\include/nodes/plannodes.h:

 * When doing UPDATE, DELETE, or SELECT FOR UPDATE/SHARE, we have to uniquely
 * identify all the source rows, not only those from the target relations, so
 * that we can perform EvalPlanQual rechecking at need.  For plain tables we
 * can just fetch the TID, much as for a target relation; this case is
 * represented by ROW_MARK_REFERENCE.  Otherwise (for example for VALUES or
 * FUNCTION scans) we have to copy the whole row value.  ROW_MARK_COPY is
 * pretty inefficient, since most of the time we'll never need the data; but
 * fortunately the overhead is usually not performance-critical in practice.
 * By default we use ROW_MARK_COPY for foreign tables, but if the FDW has
 * a concept of rowid it can request to use ROW_MARK_REFERENCE instead.
 * (Again, this probably doesn't make sense if a physical remote fetch is
 * needed, but for FDWs that map to local storage it might be credible.)

(EvalPlanQual is PostgreSQL a jargon term/function name for fetching the latest version of a row that will be locked or modified.)

To ascertain that this is the case, try EXPLAINing the same query without the update, and you will see that the plan is the same, except the ctids are not fetched, and it uses an index only scan.


Now there is one thing that is surprising to me:

Every (leaf) index entry does contain the ctid of the row it references, so I don't see a reason why that cannot be fetched with an index only scan. I asked on the mailing list, but it seems that changing this might not be trivial.