Postgres – CTE Execution Guarantees and Inlining

postgresqlsql-standard

Following query with non deterministic function yields two same rows – the CTE is computed and re-used.

WITH foo AS (SELECT uuid_generate_v4() AS id)
SELECT id FROM foo
UNION ALL
SELECT id FROM foo

| id 
| 741f4f69-416b-4b4c-9226-559527f4a84e
| 741f4f69-416b-4b4c-9226-559527f4a84e

Is this execution guaranteed? Can postgres decide to inline it to

SELECT id FROM (SELECT uuid_generate_v4() AS id) t
UNION ALL
SELECT id FROM (SELECT uuid_generate_v4() AS id) t

| id 
| 262f0006-b4d5-440a-a86a-d79cd2684458
| c24b5835-9c5c-4d6f-b649-8e510129015b

which yields two different ids (and thus different result)?

What does the SQL standard specify? Does Postgres provide any additional guarantees? If it's not guaranteed, what is standard way to reuse statement that depends on non deterministic procedures?

Best Answer

The SQL standard does not define behaviour in such cases, leaving it to the implementation of SQL servers. It (2013 draft anyway) says, in particular, in section 7.6 <table reference>

25) A <query name> is possibly non-deterministic if the <query expression> identified by the <query name> is possibly non-deterministic.

where <query name> is what is defined by the WITH clause, while in 4.22 Determinism it mentions:

Recognizing that an operation is deterministic is a difficult task, it is in general not mandated by this International Standard.[...] For other operations, this International Standard does not label an operation as deterministic; instead it identifies certain operations as “possibly non-deterministic”. Specific definitions can be found in other subclauses relative to <value expression>, <table reference>, <table primary>, <query specification>, <query expression>, and <SQL procedure statement>.

Section 4.15.8 Syntactic analysis of derived tables and cursors can be interpreted to mean that each reference to uuid_generate_v4() in your case will produce a distinct node on the syntax tree. Whether the optimizer of a particular DBMS chooses to honour its non-deterministic nature and execute the function twice is implementation-specific though.

Related Solutions

Conditional INSERT with Nested CTE in PostgreSQL

For the purpose of this question, I'll assume employee_details.name to be defined UNIQUE. Else, the whole operation wouldn't make sense.

You cannot nest a data-modifying CTE like you tried (as you already found out the hard way) - and you don't need to. This query would achieve your objective:

WITH e AS (
   SELECT name, employee_id
   FROM   employee_details
   WHERE  name = 'jack bauer'
   )
 , i1 AS (
   INSERT INTO employee             -- no target columns!
   SELECT                           -- empty SELECT list!
   WHERE NOT EXISTS (SELECT FROM e)
   RETURNING id
   )
 , i2 AS (
   INSERT INTO employee_details (name, employee_id) 
   SELECT 'jack bauer', id
   FROM   i1
   RETURNING name, employee_id
   )
SELECT employee_id, name FROM e
UNION ALL 
SELECT employee_id, name FROM i2;

The core feature is the INSERT with no target columns and an empty SELECT. Postgres fills all columns not listed in the SELECT with default values. This way we can replace the unconditional VALUES (default) with a conditional INSERT. The CTE i1 only inserts a row if the given name was not found.

The manual:

If no list of [target] column names is given at all, the default is all the columns of the table in their declared order; [...]

Each column not present in the explicit or implicit column list will be filled with a default value, either its declared default value or null if there is none.

This is a Postgres specific extension of the standard:

Also, the case in which a column name list is omitted, but not all the columns are filled from the VALUES clause or query, is disallowed by the standard.

The final CTE i2 only inserts a row if i1 returned a row. Voilá.

This is subject to race conditions under concurrent write load to the same tables. If you need to rule that out, you need to do more. Related:

How to use RETURNING with ON CONFLICT in PostgreSQL?

Without the complications from the conditional INSERT in the 2nd table, this would boil down to a common case of SELECT or INSERT:

Is SELECT or INSERT in a function prone to race conditions?

Aside

"id" text DEFAULT gen_random_uuid()

I'd strongly advise to use the data type uuid to store UUIDs.

Postgresql – Why does PostgreSQL interpret numeric UTC offset as POSIX and not ISO-8601

Good question, and I agree the manual can easily be misread there. Quoting the same source, but with one more leading sentence:

All timezone-aware dates and times are stored internally in UTC. They are converted to local time in the zone specified by the TimeZone configuration parameter before being displayed to the client.

PostgreSQL allows you to specify time zones in three different forms:

[...]

This part of the manual deals with what's allowed in the TimeZone configuration parameter (and, accordingly, the AT TIME ZONE construct).

The part of the manual you quote further down in your question deals with Time Zone Input in timestamp literals which follows the mentioned ISO-8601 convention.

You need this part of the manual explaining what's allowed for SET TIME ZONE (and, accordingly, the AT TIME ZONE construct):

TIME ZONE

SET TIME ZONE value is an alias for SET timezone TOvalue. The syntax SET TIME ZONE allows special syntax for the time zone specification. Here are examples of valid values:

'PST8PDT'
The time zone for Berkeley, California.

'Europe/Rome'
The time zone for Italy.

-7
The time zone 7 hours west from UTC (equivalent to PDT). Positive values are east from UTC.

INTERVAL '-08:00' HOUR TO MINUTE
The time zone 8 hours west from UTC (equivalent to PST).

[...]

In your example SET TIME ZONE '+02:00'; looks like an interval, but that's deceiving. Only explicitly declaring INTERVAL makes Postgres treat it as interval. '+02:00' is just another case of STDoffset, with no leading STD, hence defaulting to UTC. So:

SET TIME ZONE '+02:00';

... is equivalent to:

SET TIME ZONE 'UTC+02:00';

These quotes from the first manual page seem instrumental:

The default time zone is specified as a constant numeric offset from UTC.

And:

One should be wary that the POSIX-style time zone feature can lead to silently accepting bogus input, since there is no check on the reasonableness of the zone abbreviations. For example, SET TIMEZONE TO FOOBAR0 will work, leaving the system effectively using a rather peculiar abbreviation for UTC. Another issue to keep in mind is that in POSIX time zone names, positive offsets are used for locations west of Greenwich. Everywhere else, PostgreSQL follows the ISO-8601 convention that positive timezone offsets are east of Greenwich.

But the manual does not seem to mention that the STD part can be missing from the specification. Seems like a documentation bug ...

Best Answer

Related Solutions

Conditional INSERT with Nested CTE in PostgreSQL

Aside

Postgresql – Why does PostgreSQL interpret numeric UTC offset as POSIX and not ISO-8601

Related Question