Postgresql – Getting by list of ids: `unnest() JOIN` vs `= ANY()`

performancepostgresql

Here I have an array of 42 ids from a table of 800,000 and I want to lookup the rows with those ids.

I know of two ways to do this (besides creating a lot of parameters — that one is harder to write — assume it's off the table):

unnest() JOIN

EXPLAIN ANALYZE
SELECT *
FROM unnest('{234095,2366560,234391,234480,239511,237504,239629,247756,239142,232372,2366563,238511,242756,243396,235301,240990,233823,234537,246556,240453,245295,238909,236157,233724,236514,245384,249595,239953,233381,232180,232242,232990,237354,2366572,232066,242180,239958,239036,2366578,2166609,2197037,236542}'::int[]) AS d(id)
  JOIN phi.patient AS p ON d.id = p.id;

 Nested Loop  (cost=0.43..345.25 rows=100 width=133) (actual time=0.049..0.326 rows=42 loops=1)
   ->  Function Scan on unnest d  (cost=0.00..1.00 rows=100 width=4) (actual time=0.015..0.025 rows=42 loops=1)
   ->  Index Scan using patient_pkey on patient p  (cost=0.42..3.44 rows=1 width=129) (actual time=0.006..0.006 rows=1 loops=42)
         Index Cond: (id = d.id)
 Planning Time: 0.200 ms
 Execution Time: 0.374 ms
(6 rows)

= ANY()

EXPLAIN ANALYZE
SELECT *
FROM phi.patient
WHERE id = ANY('{234095,2366560,234391,234480,239511,237504,239629,247756,239142,232372,2366563,238511,242756,243396,235301,240990,233823,234537,246556,240453,245295,238909,236157,233724,236514,245384,249595,239953,233381,232180,232242,232990,237354,2366572,232066,242180,239958,239036,2366578,2166609,2197037,236542}');

 Index Scan using patient_pkey on patient  (cost=0.42..119.51 rows=42 width=129) (actual time=0.049..0.258 rows=42 loops=1)
   Index Cond: (id = ANY ('{234095,2366560,234391,234480,239511,237504,239629,247756,239142,232372,2366563,238511,242756,243396,235301,240990,233823,234537,246556,240453,245295,238909,236157,233724,236514,245384,249595,239953,233381,232180,232242,232990,237354,2366572,232066,242180,239958,239036,2366578,2166609,2197037,236542}'::bigint[]))
 Planning Time: 0.162 ms
 Execution Time: 0.295 ms
(4 rows)

In this simple testing and testing with ~700 ids, I see no discernible difference between these approaches.

But they produce different query plans. And it past, I remember encountering performance differences (though unfortunately I can't seem to repro them now).

Are these two approaches materially different? Is there a reason to prefer one over the other?

One thing that I thought of was that unnest can work for a composite key lookup (searching by two indexed fields, not just one).

Best Answer

You can see one difference in the plans, one has an estimated row count of 100, the other has a (correct) estimated row count of 42. (Note that this is fixed in v12, where UNNEST has been taught to count its input size when given a constant array, rather than just blindly reporting an estimate of 100). This could become important if the difference were larger than 42 vs 100, and if you added further joins (or other complexities) to the query.

One thing that I thought of was that unnest can work for a composite key lookup (searching by two indexed fields, not just one).

How would you use UNNEST for this, where you can't use =ANY equivalently instead? Usually a VALUES list is used for this (which also has its own issues, which also differ from version to version.)