I have found a way to do this efficiently, using lateral joins instead of ordinary subqueries. As a bonus, the SQL is also easier to read.
SELECT b.experiment_id, b.url_id, x.full_url_id, y.blockpage_reason_id
FROM blockpage b
JOIN LATERAL (
SELECT experiment_id, url_id, full_url_id
FROM blockpage
WHERE experiment_id = b.experiment_id AND url_id = b.url_id
ORDER BY redirect_num DESC
LIMIT 1
) x ON TRUE
LEFT JOIN LATERAL (
SELECT experiment_id, url_id, blockpage_reason_id
FROM blockpage
WHERE experiment_id = b.experiment_id AND url_id = b.url_id AND has_blockpage_regex
ORDER BY redirect_num ASC
LIMIT 1
) y ON TRUE
WHERE b.experiment_id IN (16, 43);
The documentation describes
"LATERAL" as letting you refer, in a subquery expression, to tables from earlier entries in the FROM-list, and that's true, but saying it that way doesn't really clue one in as to why one might want that. Its real power is that it directs the database to run a subquery separately for each row of the virtual table-so-far, as defined by the parent query. In this case, that means that the first inner query
SELECT experiment_id, url_id, full_url_id
FROM blockpage
WHERE experiment_id = b.experiment_id AND url_id = b.url_id
ORDER BY redirect_num DESC
LIMIT 1
will, for each value of b.experiment_id
and b.url_id
, pull a set of rows from blockpage
, sort them, and take the first. This is exactly what I was trying to achieve with the "join (select max(redirect_num) from blockpage ...)" construct in the original query, but the planner handles it much more efficiently. It winds up being even faster than the version with the WHERE duplicated.
Nested Loop Left Join (cost=17.76..162686.21 rows=8547 width=20) (actual time=17.116..28.811 rows=803 loops=1)
Output: b.experiment_id, b.url_id, blockpage.full_url_id, blockpage.redirect_num, blockpage_1.blockpage_reason_id
-> Nested Loop (cost=9.17..88989.70 rows=8547 width=16) (actual time=17.096..23.644 rows=803 loops=1)
Output: b.experiment_id, b.url_id, blockpage.full_url_id, blockpage.redirect_num
-> Index Only Scan using blockpage_experiment_id_url_id_redirect_num_blockpage_reason__ on iclab.blockpage b (cost=0.57..15293.19 rows=8547 width=8) (actual time=0.044..0.523 rows=803 loops=1)
Output: b.experiment_id, b.url_id, b.full_url_id, b.redirect_num, b.html_tag_id
Index Cond: (b.experiment_id = ANY ('{16,43}'::integer[]))
Heap Fetches: 803
-> Limit (cost=8.60..8.60 rows=1 width=16) (actual time=0.007..0.007 rows=1 loops=803)
Output: NULL::integer, NULL::integer, blockpage.full_url_id, blockpage.redirect_num
-> Sort (cost=8.60..8.60 rows=1 width=16) (actual time=0.006..0.006 rows=1 loops=803)
Output: NULL::integer, NULL::integer, blockpage.full_url_id, blockpage.redirect_num
Sort Key: blockpage.redirect_num DESC
Sort Method: quicksort Memory: 25kB
-> Index Only Scan using blockpage_experiment_id_url_id_redirect_num_blockpage_reason__ on iclab.blockpage (cost=0.57..8.59 rows=1 width=16) (actual time=0.004..0.005 rows=2 loops=803)
Output: NULL::integer, NULL::integer, blockpage.full_url_id, blockpage.redirect_num
Index Cond: ((blockpage.experiment_id = b.experiment_id) AND (blockpage.url_id = b.url_id))
Heap Fetches: 1565
-> Limit (cost=8.60..8.60 rows=1 width=16) (actual time=0.006..0.006 rows=0 loops=803)
Output: NULL::integer, NULL::integer, blockpage_1.blockpage_reason_id, blockpage_1.redirect_num
-> Sort (cost=8.60..8.60 rows=1 width=16) (actual time=0.005..0.005 rows=0 loops=803)
Output: NULL::integer, NULL::integer, blockpage_1.blockpage_reason_id, blockpage_1.redirect_num
Sort Key: blockpage_1.redirect_num
Sort Method: quicksort Memory: 25kB
-> Index Scan using blockpage_experiment_id_url_id_redirect_num_blockpage_reason__ on iclab.blockpage blockpage_1 (cost=0.57..8.59 rows=1 width=16) (actual time=0.004..0.004 rows=0 loops=803)
Output: NULL::integer, NULL::integer, blockpage_1.blockpage_reason_id, blockpage_1.redirect_num
Index Cond: ((blockpage_1.experiment_id = b.experiment_id) AND (blockpage_1.url_id = b.url_id))
Filter: blockpage_1.has_blockpage_regex
Rows Removed by Filter: 2
Planning Time: 0.382 ms
JIT:
Functions: 19
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 3.292 ms, Inlining 0.000 ms, Optimization 0.797 ms, Emission 15.897 ms, Total 19.986 ms
Execution Time: 32.324 ms
Best Answer
The correct syntax for the
EXPLAIN
call needs aSELECT
. You can't just write the bare function name in SQL:Optimization
PL/pgSQL functions are black boxes to the query planner. Queries inside are optimized just like other queries, but separately and one by one like prepared statements, and the execution plan may be cached for the duration of the session. Details:
EXPLAIN
function bodiesLike @Daniel already commented, you can use the additional module auto_explain to get more details (lots of details). Statements inside plpgsql functions are considered "nested statements". Be sure to set:
Detailed instructions:
As an exception to the rule, simple SQL functions (not plpgsql) may be "inlined", i.e. the function code is inserted into the outer query and everything is executed like there wasn't a function to begin with. The query plan includes detailed information in such cases.