Postgres – When to Mark Functions as PARALLEL RESTRICTED vs PARALLEL SAFE

functionsparallelismpostgresql

The manual for Postgres 12 says:

The following operations are always parallel restricted.

  • Scans of common table expressions (CTEs).
  • Scans of temporary tables.

Further down on the same manual page:

[…] Similarly, functions must be marked PARALLEL RESTRICTED if they access
temporary tables, client connection state, cursors, prepared
statements, or miscellaneous backend-local state which the system
cannot synchronize across workers. For example, setseed and random are
parallel restricted for this last reason.

No mention of CTEs. Now I am unsure whether I can use PARALLEL SAFE for functions containing a CTE. It would make sense to me for those to be only PARALLEL RESTRICTED.

Context: I have to determine the best label for existing user-defined functions. The setting is new since Postgres 9.6, and can have a massive impact on performance as operations involving functions that are not PARALLEL SAFE will not be performed by parallel workers, PARALLEL RESTRICTED only by the leader. (And PARALLEL USAFE disables parallelism altogether.)

I posted a related question on pgsql-general.

Best Answer

If a function contains an query that uses a CTE, and that function is used by a parallel worker process, the query and the CTE will only be executed in the private process context of the parallel worker. There is no shared state involved in creating or scanning that CTE.

So it is safe to mark the function PARALLEL SAFE.

The quotation from the documentation says that a CTE defined in a query that should be parallelized can only be scanned by the leader process, not by the parallel workers, precisely for the reason that the CTE is not shared between processes. This has no impact on CTEs defined in queries running in functions called by parallel workers, because such nested queries won't be parallelized anyway:

Even when it is in general possible for parallel query plans to be generated, the planner will not generate them for a given query if any of the following are true:

[...]

  • The query is running inside of another query that is already parallel. For example, if a function called by a parallel query issues an SQL query itself, that query will never use a parallel plan.