PostgreSQL – Optimize Conditional Subqueries

execution-planexplainoptimizationpostgresqlsubquery

I have the following query:

SELECT id,
   email,
   first_name as "firstName",
   last_name as "lastName",
   is_active as "isActive",
   password,
   access,
   CASE
     WHEN access < 3 THEN (
       SELECT
         CASE WHEN count(*) = 1 THEN true ELSE false END
       FROM user_rating_entity ure
       WHERE ure.user_id = u.id
         AND ure.rating_entity_id = :re_id
     )
     ELSE true
   END as "isResponsible"
FROM users u
WHERE u.id = :id

If access > 3, field "isResponsible" should be directly set to true and the subquery should not be executed. I used explain analyze with both cases where access >= and < to 3 but I get the same output.

Why is that so?

Best Answer

There are three important parts to reading query plans here,

Did it run. If so,
How many times?
Was it correlated?

Sample Data

You didn't provide any sample data, so let's create some.

CREATE TABLE foo AS
SELECT x FROM generate_series(1,100) AS x;

And, now let's run a basic query with subquery, outside of the possible range of execution.

EXPLAIN ANALYZE
SELECT
  x,
  (CASE WHEN x>200 THEN (SELECT sum(x) FROM foo) END)
FROM foo;

The plan will show that the case is accompanied for, but never executed.

 Seq Scan on foo  (cost=2.26..4.51 rows=100 width=4) (actual time=0.017..0.047 rows=100 loops=1)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=2.25..2.26 rows=1 width=4) (never executed)
           ->  Seq Scan on foo foo_1  (cost=0.00..2.00 rows=100 width=4) (never executed)
 Planning time: 0.101 ms
 Execution time: 0.118 ms
(6 rows)

You can see that with (never executed) on the Aggregate line. However, if we set it to something like CASE WHEN x>20 THEN (SELECT sum(x) FROM foo you'll see a lot more

 Seq Scan on foo  (cost=2.26..4.51 rows=100 width=4) (actual time=0.020..0.095 rows=100 loops=1)
   InitPlan 1 (returns $0)
     ->  Aggregate  (cost=2.25..2.26 rows=1 width=4) (actual time=0.043..0.043 rows=1 loops=1)
           ->  Seq Scan on foo foo_1  (cost=0.00..2.00 rows=100 width=4) (actual time=0.006..0.019 rows=100 loops=1)
 Planning time: 0.092 ms
 Execution time: 0.158 ms
(6 rows)

Here we can see that the Aggregate is looped through loops=1 time. PostgreSQL realizes that it isn't a correlated subquery and it's a just a reduces it to a literal (essentially). Now let's make sure it's correlated.

EXPLAIN ANALYZE
SELECT
  x,
  (CASE WHEN x>20 THEN (SELECT sum(f2.x)+f1.x FROM foo AS f2) END)
FROM foo AS f1;

Now you'll see this plan

 Seq Scan on foo f1  (cost=0.00..228.50 rows=100 width=4) (actual time=0.020..3.210 rows=100 loops=1)
   SubPlan 1
     ->  Aggregate  (cost=2.25..2.26 rows=1 width=4) (actual time=0.038..0.038 rows=1 loops=80)
           ->  Seq Scan on foo f2  (cost=0.00..2.00 rows=100 width=4) (actual time=0.005..0.017 rows=100 loops=80)
 Planning time: 0.104 ms
 Execution time: 3.272 ms

Here the key is that the aggregate has loops=80 which itself requires loops=80 seq scans.

This is all general, but it's all I can give without your sample data, or query plans.

Best Answer

Sample Data

Related Solutions

MySQL optimization – year column grouping – using temporary table, filesort

Sql-server – Logic of Assigning values to variables in Scalar-valued Functions

Related Question