PostgreSQL – Optimize Conditional Subqueries

execution-planexplainoptimizationpostgresqlsubquery

I have the following query:

SELECT id,
   email,
   first_name as "firstName",
   last_name as "lastName",
   is_active as "isActive",
   password,
   access,
   CASE
     WHEN access < 3 THEN (
       SELECT
         CASE WHEN count(*) = 1 THEN true ELSE false END
       FROM user_rating_entity ure
       WHERE ure.user_id = u.id
         AND ure.rating_entity_id = :re_id
     )
     ELSE true
   END as "isResponsible"
FROM users u
WHERE u.id = :id

If access > 3, field "isResponsible" should be directly set to true and the subquery should not be executed. I used explain analyze with both cases where access >= and < to 3 but I get the same output.

Why is that so?

Best Answer

There are three important parts to reading query plans here,

  • Did it run. If so,
  • How many times?
  • Was it correlated?

Sample Data

You didn't provide any sample data, so let's create some.

CREATE TABLE foo AS
SELECT x FROM generate_series(1,100) AS x;

And, now let's run a basic query with subquery, outside of the possible range of execution.

EXPLAIN ANALYZE
SELECT
  x,
  (CASE WHEN x>200 THEN (SELECT sum(x) FROM foo) END)
FROM foo;

The plan will show that the case is accompanied for, but never executed.

 Seq Scan on foo  (cost=2.26..4.51 rows=100 width=4) (actual time=0.017..0.047 rows=100 loops=1)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=2.25..2.26 rows=1 width=4) (never executed)
           ->  Seq Scan on foo foo_1  (cost=0.00..2.00 rows=100 width=4) (never executed)
 Planning time: 0.101 ms
 Execution time: 0.118 ms
(6 rows)

You can see that with (never executed) on the Aggregate line. However, if we set it to something like CASE WHEN x>20 THEN (SELECT sum(x) FROM foo you'll see a lot more

 Seq Scan on foo  (cost=2.26..4.51 rows=100 width=4) (actual time=0.020..0.095 rows=100 loops=1)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=2.25..2.26 rows=1 width=4) (actual time=0.043..0.043 rows=1 loops=1)
           ->  Seq Scan on foo foo_1  (cost=0.00..2.00 rows=100 width=4) (actual time=0.006..0.019 rows=100 loops=1)
 Planning time: 0.092 ms
 Execution time: 0.158 ms
(6 rows)

Here we can see that the Aggregate is looped through loops=1 time. PostgreSQL realizes that it isn't a correlated subquery and it's a just a reduces it to a literal (essentially). Now let's make sure it's correlated.

EXPLAIN ANALYZE
SELECT
  x,
  (CASE WHEN x>20 THEN (SELECT sum(f2.x)+f1.x FROM foo AS f2) END)
FROM foo AS f1;

Now you'll see this plan

 Seq Scan on foo f1  (cost=0.00..228.50 rows=100 width=4) (actual time=0.020..3.210 rows=100 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=2.25..2.26 rows=1 width=4) (actual time=0.038..0.038 rows=1 loops=80)
           ->  Seq Scan on foo f2  (cost=0.00..2.00 rows=100 width=4) (actual time=0.005..0.017 rows=100 loops=80)
 Planning time: 0.104 ms
 Execution time: 3.272 ms

Here the key is that the aggregate has loops=80 which itself requires loops=80 seq scans.

This is all general, but it's all I can give without your sample data, or query plans.