Postgresql – Optimising expensive join/subquery by filtering with conditionals

join;postgresql

This is a question about short-circuiting expensive JOINs or sub-queries in Postgresql (9.5 or 9.6). I'm also be interested in hearing how people in general solve the check-then-execute problem.

I'm writing a lot of queries that should only conditionally return a result, such as if the (web) user owns the record or if the record was modified. I'm trying to prevent building expensive views inside Postgresql and multiple back-and-forth queries to check for conditions in the application itself, so I attempt to write queries that first select the correct record and show which conditions failed, and only execute the view if the conditions pass.

For instance, this checks if the (application) user owns a record before returning it:

SELECT is_owner, is_newer, json 
FROM (
     SELECT id, owner = '053bffbc-c41e-dad4-853b-ea91fc42ea18' "is_owner"
          , modified >= created "is_newer" 
     FROM datasets 
     WHERE id = '056e4eed-ee63-2add-e981-0c86b8b6a66f'
) cond
LEFT JOIN LATERAL (
     SELECT id 
     FROM datasets 
     WHERE is_owner and is_newer
) authed
    ON cond.id = authed.id
LEFT JOIN LATERAL (
     SELECT json 
     FROM view_dataset 
     WHERE id = authed.id
) dataset
    ON true;

Resulting in (is owner):

is_owner | is_newer | json
t          t          {...}

And a negative result (not owner):

is_owner | is_newer | json
f          t          NULL

So the application knows which error to return but we don't have to build or parse the view if the conditions don't pass.

However, EXPLAIN ANALYSE shows Postgresql still executes the view query in the last LEFT LATERAL JOIN even though the middle JOIN doesn't have any results, and I can't get it to short-circuit to prevent the (expensive) view_dataset SELECT from running. If I set json to null the query skips everything but the first SELECT; but if it is set to a value from the last query, it will always execute all SELECTs, so I guess the query planner thinks it has to get a result for that json field in the top SELECT query and doesn't short-circuit the JOINs.

I wonder if I can force Postgresql to drop the expensive view query.

I have also tried a CTE, which does seem to skip the JOIN query:

WITH cond as (
    SELECT id, owner = '053bffbc-c41e-dad4-853b-ea91fc42ea18' "is_owner", modified >= created "is_newer" FROM datasets WHERE id = '056e4eed-ee63-2add-e981-0c86b8b6a66f'
)
SELECT cond.id, cond.is_owner, cond.is_newer, json FROM
    (SELECT id FROM cond WHERE cond.is_owner and cond.is_newer) filtered
    LEFT JOIN LATERAL
    (SELECT id, json from view_dataset) dataset
    USING (id)
    RIGHT JOIN cond
    USING(id);

… but this query and variations are at least 2x slower.

So my question is how to maximise performance by short-circuiting JOINs or subqueries based on conditions; and I'm also interested to hear if somebody has other ideas how to implement some check-first-then-execute pattern such as checking for record ownership.

Best Answer

Not sure why authed is needed at all. What does:

SELECT is_owner, is_newer, json 
FROM (
     SELECT id, owner = '053bffbc-c41e-dad4-853b-ea91fc42ea18' "is_owner"
          , modified >= created "is_newer" 
     FROM datasets 
     WHERE id = '056e4eed-ee63-2add-e981-0c86b8b6a66f'
) cond
LEFT JOIN LATERAL (
     SELECT json 
     FROM view_dataset 
     WHERE id = cond.id
) dataset
    ON is_owner and is_newer;

by you? I also agree with the comment made by a_horse_with_no_name. LATERAL can be a tremendous help for pushing predicates down to the base tables in special cases, but it is just a sub-query in disguise so in most cases it makes more sense to do an ordinary join. Do also try:

SELECT is_owner, is_newer, json 
FROM (
     SELECT id, owner = '053bffbc-c41e-dad4-853b-ea91fc42ea18' "is_owner"
          , modified >= created "is_newer" 
     FROM datasets 
     WHERE id = '056e4eed-ee63-2add-e981-0c86b8b6a66f'
) cond
LEFT JOIN view_dataset wd
    ON wd.id = cond.id 
   AND cond.is_owner 
   AND cond.is_newer;

EDIT. table valued function

CREATE FUNCTION get_view_dataset(int,bool) 
    RETURNS setof view_dataset AS '
        SELECT * 
        FROM view_dataset wd
        WHERE wd.id = $1 
          AND $2;
    ' LANGUAGE SQL;

and then use that function in your query as:

SELECT is_owner, is_newer, json 
FROM (
     SELECT id, owner = '053bffbc-c41e-dad4-853b-ea91fc42ea18' "is_owner"
          , modified >= created "is_newer" 
     FROM datasets 
     WHERE id = '056e4eed-ee63-2add-e981-0c86b8b6a66f'
) cond
LEFT JOIN get_view_dataset(cond.id, cond.is_owner AND cond.is_newer);

All untested.