Sql-server – Why we need to optimize CONTAINED IN clause of FOR_SYSTEM TIME using check constraint

sql serversql-server-2016t-sqltemporal-tables

Using temporal tables one is able to use the FOR_SYSTEM TIME clause with combinations with other options to get valid rows for specific time intervals.

For example, we have:

FOR SYSTEM_TIME ALL
FOR SYSTEM_TIME AS OF @dt
FOR SYSTEM_TIME clause is FROM @start TO @end
FOR SYSTEM_TIME clause is BETWEEN @start AND @end
FOR SYSTEM_TIME clause is CONTAINED IN (@start, @end)

What is said in the documentation and I believe is not true is:

If you search for non-current row versions only, we recommend you to
use CONTAINED IN as it works only with the history table and will
yield the best query performance.

For example, you can see below both table are queried:

Fortunately, Itzik Ben-Gan advises to create check constraint to the temporal table like this in order to help the SQL Engine:

ALTER TABLE [dbo].[SecurityUsersEncrypted]
ADD CONSTRAINT [CK_SecurityUsersEncrypted_SysEndTime]
CHECK ([SysEndTime] = '9999-12-31 23:59:59.9999999');

And the main table is not queried:

If I am using the maximum end period (forcing reads of the main table), the execution plan is changed, too (which is normal):

And if I use one of the other clauses, even my query do not return data from the main table (only history data is returned), the optimization is not applied (as pointed in the documentation):

So, my questions are:

Why in the documentation is said a optimization is working for CONTAINED IN clause, but actually you need to create such check constraint to use it? Should I always create such constraint or this is going to change?
All clauses can return rows from both tables and one of them. Why this optimization is working only for the CONTAINED IN clause?

Best Answer

Why in the documentation is said a optimization is working for CONTAINED IN clause, but actually you need to create such check constraint to use it?

The optimization is really occurring - even though the main table index scan appears in the execution plan, it is never executed because it's behind a filter with a startup expression predicate.

See below for a more detailed explanation.

All clauses can return rows from both tables and one of them. Why this optimization is working only for the CONTAINED IN clause?

I don't think anyone but the SQL Server product team could answer that one, unfortunately.

The execution plan is a bit misleading here. If I create a temporal table:

CREATE TABLE dbo.ThisIsJustATest
(    
  [Id] int NOT NULL PRIMARY KEY CLUSTERED IDENTITY(1,1)
  , [Junk] nvarchar(2048) NOT NULL 
  , [ValidFrom] datetime2 (2) GENERATED ALWAYS AS ROW START  
  , [ValidTo] datetime2 (2) GENERATED ALWAYS AS ROW END  
  , PERIOD FOR SYSTEM_TIME (ValidFrom, ValidTo)  
 )    
 WITH (SYSTEM_VERSIONING = ON (HISTORY_TABLE = dbo.ThisIsJustATestHistory));

And then load it with data:

INSERT INTO dbo.ThisIsJustATest (Junk)
SELECT m.[text] FROM sys.messages m
OPTION (MAXDOP 1);

And then update some rows (so that I have history rows to query):

UPDATE dbo.ThisIsJustATest
SET Junk = 'Busted'
WHERE Id % 100 = 0;

And run this query (that only returns history rows) with stats I/O on:

SET STATISTICS IO ON;

SELECT * 
FROM dbo.ThisIsJustATest FOR SYSTEM_TIME CONTAINED IN ('2018-09-11 14:17:00.96', '2018-09-11 14:18:51.35');

Here's what I get:

Table 'ThisIsJustATest'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'ThisIsJustATestHistory'. Scan count 5, logical reads 154, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

So according to stats I/O, it really did zero reads from the base table - it only read from the history table. Here's the execution plan.

Notice that all the predicates on the base table's index scan operator are residual predicates. So in theory, the whole table should get read (even though there are no matching rows). So there is some trickery going on on.

It doesn't read anything from the base table because the predicate in the filter operator must always evaluate to false:

'2018-09-11 14:18:51.35'=sysmaxdatetime((2))

Note that the predicate is a Startup Expression, so the Clustered Index Scan below it does not get started when the expression evaluates to false. sysmaxdatetime is an intrinsic (internal) function that presumably returns the maximum date/time value at the given precision.

The "Number of Executions" on the Clustered Index Scan is 4, but this is simply an internal quirk related to the way parallel plans work - a thread does call Open() on the subtree to ensure all necessary threads get created and initialized, but no real work is done. In a serial version of the same plan, the number of executions would indeed be zero.

What about the CHECK constraint?

For what's worth, it appears that adding the check constraint is still useful as a performance tuning option, as it reduces logical reads on the history table by quite a bit.

Before the constraint:

Table 'ThisIsJustATest'. Scan count 0, logical reads 0
Table 'ThisIsJustATestHistory'. Scan count 5, logical reads 160

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 197 ms.

After the constraint:

Table 'ThisIsJustATestHistory'. Scan count 1, logical reads 53

 SQL Server Execution Times:
   CPU time = 16 ms,  elapsed time = 197 ms.

Best Answer

What about the CHECK constraint?

Related Solutions

Sql-server – Will Temporal Tables be included in the Standard Edition of SQL Server 2016

Sql-server – Why do temporal tables log the begin time of the transaction

Related Question