Postgresql – Why is this PostgreSQL volatile function faster than an inline subquery

postgresqlpostgresql-performance

The following examples have been tested in PostgreSQL versions 11 and 12.

A few of the tables involved have around 2M rows, others not more than 250k.

I have very big and nested queries like this (not the actual query, just an example):

SELECT coalesce(column1, 1) AS label1,
       coalesce(column2, 2) AS label2,
       coalesce(column3, 3) AS label3
FROM
  (SELECT table2.some_id AS label4,
                       table2.id AS label5,
                       table3.some_date AS label6,
                       CASE
                           WHEN (table2.some_value IS NOT NULL
                                 OR table2.some_other_value IS NOT NULL) THEN least(table2.some_value * 1, greatest(table4.table4, table2.some_other_value * 1))
                           ELSE table4.table4
                       END AS label6
   FROM table3
   JOIN
     (SELECT coalesce(table5.table5) AS table4,
             coalesce(table5.table5_1) AS table5_1,
             coalesce(table5.table5_2) AS table5_2,
             coalesce(table5.table5_3) AS table54_3
      FROM
        (SELECT table2.some_value * 1 AS table5,
                table2.some_id AS table5_some_id,
                table2.id AS table5_id,
                table3.some_date AS table5_some_date
         FROM table3
         JOIN table2 ON table2.some_other_id IN
           (SELECT unnest(something_nested.ids) AS unnest_1
            FROM
              (SELECT ...

The query above takes around 49s to execute.

Then I decided to convert some of these subqueries into functions. It was literally just a cut/paste of some pieces of the big query, resulting in a final query that reads like this:

SELECT coalesce(column1, 1) AS label1,
       coalesce(column2, 2) AS label2,
       coalesce(column3, 3) AS label3
FROM some_function()

-- some_function created as "STABLE":
SELECT table2.some_id AS label4,
       table2.id AS label5,
       table3.some_date AS label6,
       CASE
           WHEN (table2.some_value IS NOT NULL
                 OR table2.some_other_value IS NOT NULL) THEN least(table2.some_value * 1, greatest(table4.table4, table2.some_other_value * 1))
           ELSE table4.table4
       END AS label6
FROM some_other_function()

The query above takes the same 49s to execute.

However, as soon as I changed one function (any of them) to VOLATILE, the query magically executes in less than 250ms.

My questions are:

  • Why VOLATILE is way faster than STABLE in this case? Is it safe to use VOLATILE?
  • Why declaring subqueries as function executes faster than just inlining them? Is there a flag or some way to optimise this kind of scenario?

I tried to add the execution plan here but the text body is limited to 30000 characters and one plan alone is bigger than that.

This is the obfuscated execution plan for the first query (inlined subqueries) https://explain.depesz.com/s/A3mG

This is the obfuscated execution plan for the query with volatile functions that runs super fast https://explain.depesz.com/s/Yhs1


This configuration did the trick, the first query now runs in 230ms:

SET enable_nestloop = off;

I'm interested to know, behind the scenes, what Postgres is doing that is making these queries run so quickly.

Best Answer

Incorrect volatility labels can prevent function inlining. See:

Choose the correct volatility setting. (Leave it at its default VOLATILE if in doubt.)

Your case seems to have a superposition of multiple problems. Hard to diagnose, especially with the limited, obfuscated information we have. Seems like the right function volatility tilted the scales for a favorable query plan (by chance?).

In your slow query plan, estimates are (dramatically) off for every Nested Loop node. Like:

Nested Loop (cost=1,440.41..6,158.80 rows=1 width=72) (actual time=0.051..226.283 rows=984 loops=264)

I would start investigating why Postgres expects too few rows. Read up on ANALYZE and column statistics. Running with SET enable_nestloop = off is a very ugly emergency solution to cover up the underlying issue. Restrict the setting to the problem query: SET LOCAL enable_nestloop = off in the same transaction. And only use it temporarily, if at all.

Some pointers to where to investigate: