Postgres – CTE Execution Guarantees and Inlining

postgresqlsql-standard

Following query with non deterministic function yields two same rows – the CTE is computed and re-used.

WITH foo AS (SELECT uuid_generate_v4() AS id)
SELECT id FROM foo
UNION ALL
SELECT id FROM foo

| id 
| 741f4f69-416b-4b4c-9226-559527f4a84e
| 741f4f69-416b-4b4c-9226-559527f4a84e

Is this execution guaranteed? Can postgres decide to inline it to

SELECT id FROM (SELECT uuid_generate_v4() AS id) t
UNION ALL
SELECT id FROM (SELECT uuid_generate_v4() AS id) t

| id 
| 262f0006-b4d5-440a-a86a-d79cd2684458
| c24b5835-9c5c-4d6f-b649-8e510129015b

which yields two different ids (and thus different result)?

What does the SQL standard specify? Does Postgres provide any additional guarantees? If it's not guaranteed, what is standard way to reuse statement that depends on non deterministic procedures?

Best Answer

The SQL standard does not define behaviour in such cases, leaving it to the implementation of SQL servers. It (2013 draft anyway) says, in particular, in section 7.6 <table reference>

25) A <query name> is possibly non-deterministic if the <query expression> identified by the <query name> is possibly non-deterministic.

where <query name> is what is defined by the WITH clause, while in 4.22 Determinism it mentions:

Recognizing that an operation is deterministic is a difficult task, it is in general not mandated by this International Standard.[...] For other operations, this International Standard does not label an operation as deterministic; instead it identifies certain operations as “possibly non-deterministic”. Specific definitions can be found in other subclauses relative to <value expression>, <table reference>, <table primary>, <query specification>, <query expression>, and <SQL procedure statement>.

Section 4.15.8 Syntactic analysis of derived tables and cursors can be interpreted to mean that each reference to uuid_generate_v4() in your case will produce a distinct node on the syntax tree. Whether the optimizer of a particular DBMS chooses to honour its non-deterministic nature and execute the function twice is implementation-specific though.