Postgresql – Can declaring function volatility IMMUTABLE harm performance

functionsperformancepostgresql

Postgres functions are declared with volatility classification VOLATILE, STABLE or IMMUTABLE. The project is known to be very strict with these labels for built-in functions. And with good reason. Prominent example: expression indexes only allow IMMUTABLE functions and those have to be truly immutable to avoid incorrect results.

User-defined functions are still free to be declared as the owner chooses. The manual advises:

For best optimization results, you should label your functions with the strictest volatility category that is valid for them.

… and adds an extensive list of things that can go wrong with an incorrect volatility label.

Still, there are cases where faking immutability makes sense. Mostly when you know the function is, in fact, immutable within your scope. Example:

All possible implications on data integrity aside, what is the effect on performance? One might assume declaring a function IMMUTABLE can only be beneficial to performance . Is that so?

Can declaring function volatility IMMUTABLE harm performance?

Let's assume current Postgres 10 to narrow it down, but all recent versions are of interest.

Best Answer

Yes, it can harm performance.

Simple SQL functions can be "inlined" in the calling query. Quoting the Postgres Wiki:

SQL functions (i.e. LANGUAGE SQL) will, under certain conditions, have their function bodies inlined into the calling query rather than being invoked directly. This can have substantial performance advantages since the function body becomes exposed to the planner of the calling query, which can apply optimizations such as constant-folding, qual pushdown, and so on.

Bold emphasis mine.

To enforce correctness, there are a number of preconditions. One of them:

if the function is declared IMMUTABLE, then the expression must not invoke any non-immutable function or operator

Meaning, SQL functions using any non-immutable functions but still being declared IMMTUTABLE are excluded from this optimization. Triggered by these related answers on SO, I have been running extensive tests:

Basically comparing these two variants of a simple SQL function (mapping dates to an integer, ignoring the year which doesn't matter for the purpose):

CREATE FUNCTION f_mmdd_tc_s(date) RETURNS int LANGUAGE sql STABLE    AS
$$SELECT to_char($1, 'MMDD')::int$$;

CREATE FUNCTION f_mmdd_tc_i(date) RETURNS int LANGUAGE sql IMMUTABLE AS
$$SELECT to_char($1, 'MMDD')::int$$;  -- cannot be inlined!

The Postgres function to_char() is only STABLE, not IMMUTABLE (all overloaded instances of it - for reasons beyond the scope of this answer). So the second one is fake IMMUTABLE and turns out to be 5x slower in a simple test:

db<>fiddle here

This specific example can be replaced with the equivalent:

CREATE FUNCTION f_mmdd(date) RETURNS int LANGUAGE sql IMMUTABLE AS
$$SELECT (EXTRACT(month FROM $1) * 100 + EXTRACT(day FROM $1))::int$$;

Would seem more expensive with two function calls and more computations. But the IMMUTABLE label is true (plus, the used function is faster and coercing text to integer is more expensive, too).

2x as fast as the faster variant above (10x as fast as the slower). The point being: Use IMMUTABLE functions where possible, then you don't have to "cheat" to begin with.