Using temporal tables one is able to use the FOR_SYSTEM TIME clause with combinations with other options to get valid rows for specific time intervals.
For example, we have:
- FOR SYSTEM_TIME ALL
- FOR SYSTEM_TIME AS OF @dt
- FOR SYSTEM_TIME clause is FROM @start TO @end
- FOR SYSTEM_TIME clause is BETWEEN @start AND @end
- FOR SYSTEM_TIME clause is CONTAINED IN (@start, @end)
What is said in the documentation and I believe is not true is:
If you search for non-current row versions only, we recommend you to
use CONTAINED IN as it works only with the history table and will
yield the best query performance.
For example, you can see below both table are queried:
Fortunately, Itzik Ben-Gan advises to create check constraint to the temporal table like this in order to help the SQL Engine:
ALTER TABLE [dbo].[SecurityUsersEncrypted]
ADD CONSTRAINT [CK_SecurityUsersEncrypted_SysEndTime]
CHECK ([SysEndTime] = '9999-12-31 23:59:59.9999999');
And the main table is not queried:
If I am using the maximum end period (forcing reads of the main table), the execution plan is changed, too (which is normal):
And if I use one of the other clauses, even my query do not return data from the main table (only history data is returned), the optimization is not applied (as pointed in the documentation):
So, my questions are:
- Why in the documentation is said a optimization is working for CONTAINED IN clause, but actually you need to create such check constraint to use it? Should I always create such constraint or this is going to change?
- All clauses can return rows from both tables and one of them. Why this optimization is working only for the CONTAINED IN clause?
Best Answer
The optimization is really occurring - even though the main table index scan appears in the execution plan, it is never executed because it's behind a filter with a startup expression predicate.
See below for a more detailed explanation.
I don't think anyone but the SQL Server product team could answer that one, unfortunately.
The execution plan is a bit misleading here. If I create a temporal table:
And then load it with data:
And then update some rows (so that I have history rows to query):
And run this query (that only returns history rows) with stats I/O on:
Here's what I get:
So according to stats I/O, it really did zero reads from the base table - it only read from the history table. Here's the execution plan.
Notice that all the predicates on the base table's index scan operator are residual predicates. So in theory, the whole table should get read (even though there are no matching rows). So there is some trickery going on on.
It doesn't read anything from the base table because the predicate in the filter operator must always evaluate to false:
Note that the predicate is a Startup Expression, so the Clustered Index Scan below it does not get started when the expression evaluates to false.
sysmaxdatetime
is an intrinsic (internal) function that presumably returns the maximum date/time value at the given precision.The "Number of Executions" on the Clustered Index Scan is 4, but this is simply an internal quirk related to the way parallel plans work - a thread does call
Open()
on the subtree to ensure all necessary threads get created and initialized, but no real work is done. In a serial version of the same plan, the number of executions would indeed be zero.What about the CHECK constraint?
For what's worth, it appears that adding the check constraint is still useful as a performance tuning option, as it reduces logical reads on the history table by quite a bit.
Before the constraint:
After the constraint: