PostgreSQL: Immutable, Volatile, Stable

postgresqlterminology

I'm unclear about the true meaning in the definitions for IMMUTABLE, VOLATILE and STABLE functions.

I read the documentation, specifically the definitions of each.

IMMUTABLE indicates that the function cannot modify the database and
always returns the same result when given the same argument values; that is, it does not do database lookups or otherwise use
information not directly present in its argument list. If this option
is given, any call of the function with all-constant arguments can be
immediately replaced with the function value.

STABLE indicates that the function cannot modify the database, and
that within a single table scan it will consistently return the same
result for the same argument values, but that its result could change
across SQL statements. This is the appropriate selection for functions
whose results depend on database lookups, parameter variables (such as
the current time zone), etc. (It is inappropriate for AFTER triggers
that wish to query rows modified by the current command.) Also note
that the current_timestamp family of functions qualify as stable,
since their values do not change within a transaction.

VOLATILE indicates that the function value can change even within a
single table scan, so no optimizations can be made. Relatively few
database functions are volatile in this sense; some examples are
random(), currval(), timeofday(). But note that any function that has
side-effects must be classified volatile, even if its result is quite
predictable, to prevent calls from being optimized away; an example is
setval().

My confusion comes in with the condition for IMMUTABLE and STABLE that the function ALWAYS or CONSISTENTLY returns the same result given the same arguments.

The IMMUTABLE definition states that the function does not database lookups or otherwise use information not directly present in its argument list. So, to me, that means such functions are used to manipulate data provided by the client, and should not have SELECT statements…although that just sounds a bit odd to me.

With STABLE, the definition is similar in that it says it should consistently return the same result. So, to me, that means that everytime the function is called with the same arguments, it should return the same results (same exact rows, every single time).

So, to me…that means that any function that performs a SELECT on a table or tables that can be updated, should only be volatile.

But, again…that doesn't sound right to me.

Bringing this back to my use-case, I am writing functions that perform SELECT statements with multiple JOIN's on tables that are constantly being added to, so the function calls would be expected to return different results each time it's called, even with the same arguments.

So, does that mean that my functions should be VOLATILE? Even though the documentation indicates relatively few database functions are volatile in this sense?

Thank you!

Best Answer

IMMUTABLE must be a pure function, whose results depend only on its inputs. This is a very strict requirement; they cannot call other non-immutable functions, they cannot access tables, they cannot access the value of configuration properties, etc.

STABLE can use any inputs that are themselves STABLE: other STABLE or IMMUTABLE functions, and SELECT queries of tables. It's safe to query tables because the function's view of those tables will not change within the query's current snapshot. You can access GUC values (current_setting(...)) so long as you know they won't also be assigned to within the current statement.

VOLATILE functions are everything that doesn't fit the above:

Anything with side-effects
Anything that does writes
Anything that queries external data not managed by the PostgreSQL snapshot
...

In general, just leave everything VOLATILE unless you have a good reason not to.

The main reason to use IMMUTABLE is when writing functions that are to be used as part of index expressions.

Base query

First of all, your base query is needlessly complex and possibly incorrect. Consider instead:

(TABLE a EXCEPT ALL TABLE b)  -- ALL keeps duplicate rows and is faster
UNION ALL
(TABLE b EXCEPT ALL TABLE a);

I doubt you want to fold duplicates. Typically, tables do not hold complete duplicates to begin with, and omitting the ALL key word would instruct Postgres to try and fold duplicates in every step, which is just a waste of time. Details:

Select rows which are not present in other table

Even if completely duplicate rows are possible, it might be best to return all of them to avoid misleading answers. If you want to fold duplicates in the result anyway, a single UNION does the job:

(TABLE a EXCEPT ALL TABLE b)
UNION
(TABLE b EXCEPT ALL TABLE a);

However, a query with NATURAL FULL OUTER JOIN is more efficient, doing almost the same:

SELECT *
FROM   a NATURAL FULL OUTER JOIN b
WHERE  a IS NULL OR b IS NULL;

There are two subtle, exotic corner cases:

This returns rows that are all NULL to begin with. It would be an odd table design that allows such rows, though.
This does not return duplicate rows that are not matched with the same number of duplicate rows in the other table: a single match is enough to eliminate all - which is subtly different from folding duplicates in the result! Have a look at the fiddle below. Again, it would be an odd table design that allows completely duplicate rows.

Function

What you have so far does not work for several reasons.

To return actual table rows dynamically (and not just a count or a text representation), you need to use a polymorphic type.

Since the second table is bound to have a compatible row type (as per my assumption) it's enough to hand in just the table name for that.

CREATE OR REPLACE FUNCTION f_tbl_diff(_tbl1_type ANYELEMENT, _tbl2 text)
  RETURNS SETOF ANYELEMENT AS
$func$
BEGIN
   RETURN QUERY EXECUTE format('
      SELECT *
      FROM   %1$s NATURAL FULL OUTER JOIN %2$I
      WHERE  %1$s IS NULL OR %2$I IS NULL'
    , pg_typeof(_tbl1_type), _tbl2);
END
$func$  LANGUAGE plpgsql;

Call:

SELECT * FROM f_tbl_diff(NULL::a, 'b');

Note the special syntax for the first argument! We hand in an actual row defining the return type, not just a table name. Related answer on SO with ample details (scroll to the last chapter):

Refactor a PL/pgSQL function to return the output of various SELECT queries

This returns all rows that have no exact, complete match in the other table

SQL Fiddle also demonstrating exotic corner cases.

About the dynamic SQL:

SQL injection in Postgres functions vs prepared queries

Best Answer

Related Solutions

Postgresql – Can stable (or immutable) functions call volatile functions

Postgresql – Find unmatched rows between two tables dynamically

Base query

Function

Related Question