I'm encountering a strange issue occurring when accessing historical records within a temporal table. Queries that access the older entries in the temporal table via the AS OF sub-clause take longer than queries on recent historical entries.
The historical table was generated by SQL Server (includes a clustered index on the date columns and uses page compression), I have added 50 million rows to the historical table, and my queries were retrieving about 25,000 rows.
I have tried to determine the root cause of the issue but have not been able to identify it. So far I have tested:
- Creating a test table with 50 million rows with a clustered index to see if the slow down was simply due to volume. I was able to retrieve 25K rows at constant time (~400ms).
- Removing page compression from the historical table. That had no effect on the retrieval time but did significantly increase the size of the table.
- I tried accessing the rows of the history table directly using an ID column vs the date columns. This is where things were a bit more interesting. I could access older rows in the table at ~400ms where as with the AS OF sub clause it would take ~1200ms. I tried filtering on my test table on the date column and noticed a similar slowdown when compared to filtering on the ID column. This leads me to believe that the date comparisons are behind some of the slowdown.
I want to look at this more but I also want to make sure that I am not barking up the wrong tree. First, has anyone else experienced this same behavior when accessing older historical data in a temporal table (we only noticed slow downs passed 10 million rows)? Second, what are some strategies I can use to further isolate the root cause of the performance issue (I just started looking into execution plans but it is still a bit cryptic to me)?
Execution plans
These are simple retrieval queries: the first accesses older rows, the second accesses newer rows.
Older Rows ~1200ms execution time
Recent Rows ~350ms execution time
Table details
These are the columns in the temporal table. The history table has the same columns but does not have a primary key (as per the history table requirements):
Best Answer
In a comment from Zane on your question, he stated:
This is, indeed, the problem. There's no index available to push some, or all, of the predicates down to the storage engine. Microsoft recommends this baseline indexing strategy for temporal tables in the Docs article Temporal Table Considerations and Limitations:
The phrasing of that is a little confusing (to me, anyway). But the takeaway is that you could create these indexes to improve performance some, if not quite a lot:
NC index on the current table, leading with
SysEndTime
:This will allow you to avoid reading some of the rows in the current table by seeking to the appropriate end time.
CCI on the history table
This will let you get batch mode on the history table, which should make the scans much faster.
NC index on the current table, leading with
SysStartTime
:See Paul's answer to the question Most Efficient Way to Retrieve Date Ranges for more details on why indexing for date range queries is hard. Based on the logic there, it makes sense to add another NC index on the current table that leads with SysStartTime, so that the optimizer can choose which one to use based on statistics and the specific parameters of your query:
Creating the 3 indexes outlined above made a significant difference in resource usage in my test cases. I set up a test case which runs two queries that return 1.5 million total rows. Both the history and current tables have 50 million rows).
Note: To reduce SSMS overhead, I ran the test with "Discard results after execution" option enabled.
Execution Plan - Default Indexes
Logical reads: 1,330,612
CPU time: 00:00:14.718
Elapsed time: 00:00:06.198
Execution Plan - With Indexes Described Above
Logical reads: 27,656 (8,111 row store + 19,545 columnstore)
CPU time: 00:00:01.828
Elapsed time: 00:00:01.150
As you can see, all 3 measures dropped significantly - including total elapsed time, from 6 seconds to 1 second.
The other option presented by the Docs article is to forgo the two NC indexes on the current table in favor of a clustered columnstore index. In my test, performance was very similar to the indexing solution described above.