SQL Server Query Optimization – Date Range in WHERE Clause and Execution Plan Issues

datetimeexecution-plansql server

I have the following query in SQL server:

declare @timestamp_start date = getdate() - 1
declare @timestamp_end date = getdate()

SELECT  
    my_date = a.itemTimestamp
FROM
    my_table a with(nolock)
WHERE 
    (@timestamp_start is null OR a.itemTimestamp >= @timestamp_start)
    AND (@timestamp_end is null OR a.itemTimestamp < @timestamp_end)

It's extremely slow (90s for < 2000 records), because the execution plan uses the clustered index of the PK column for 'my_table':

Why is this happening and how can I optimize the query performance?

Best Answer

What is the datatype of itemTimestamp? Is that column indexed? Is the column nullable?

Assuming the answers are datetime, Yes, and No respectively you might consider

WHERE  a.itemTimestamp >= isnull(@timestamp_start, '17530101')
       AND a.itemTimestamp < isnull(@timestamp_end, '99991231')

If you were to change to datetime2 and actually store dates before 1753 you would need to change the lower bound condition to 00010101. This also won't bring back dates matching the higher bound of 9999-12-31 - but in the real world this is unlikely to cause a problem.

The Seek

When the LIKE specification is in a variable:

WHERE DESCRIPTOR_TEXT LIKE @ccode

...SQL Server needs to build a plan that will work whatever the variable contains each time the cached execution plan is executed. It would also like to be able to seek if the variable contents are compatible. The solution is to use a dynamic seek, as I explain in the following article:

Dynamic Seeks and Hidden Implicit Conversions

This can result in a much better plan if the variable contains something that can be transformed to an efficient seek, which usually means no leading wildcards. If there is a leading wildcard, the dynamic seek may turn out to be less efficient than a simple scan with a residual LIKE predicate. For example, a full scan may issue more, and larger, read-ahead reads.

To be clear, a dynamic seek can still be used if the variable contains a leading wildcard, but the start and end points of the seek will cover the whole structure.

The Scan

Making the leading-wildcard explicit in a string literal makes it clear that a seek cannot be used, so you get a scan:

WHERE DESCRIPTOR_TEXT LIKE '%NIP%'

The downside is that this plan cannot be reused for different string literals, which may cause excessive plan cache usage.

The combination

Using a variable with the recompile option:

WHERE DESCRIPTOR_TEXT LIKE @ccode
OPTION (RECOMPILE)

...means SQL Server can 'sniff' the value of the variable at runtime, embed the value it finds into the query, and produce a plan optimized for that specific value.

The resulting execution plan will not be cached, so there is no possibility of plan reuse, but this also means you will not suffer from plan cache bloat either. There will be a small overhead on each call for the fresh compilation, but this is often a trade-off that makes sense: If the variable contains some seekable, you'll get a seek plan; otherwise, a scan. More details in my article:

Parameter Sniffing, Embedding, and the RECOMPILE Options

SQL Server Index – Why Use Non-Clustered Instead of Clustered

The optimizer has a choice between two main strategies:

Scan the table (the clustered index) checking every row to see if LoanNum = 2712.
Scan & Lookup
- Scan the nonclustered index to find rows where LoanNum = 2712
- Look up the column data for the matched rows not covered by the nonclustered index.

The key point is that the nonclustered index is smaller, so scanning it is expected to be cheaper. This might seem counterintuitive because the clustered index definition has the same keys, and the nonclustered index has included columns, but the point is that the clustered index includes all columns stored in-row - the leaf (lowest) level of the clustered index literally is the in-row data.

For a small number of expected matches, the cost saving from scanning the smaller index is more than enough to compensate for the key lookups.

Incidentally, you may find that removing the WHERE 1 = 1 from your query causes the optimizer to choose the clustered index scan. The (redundant) constant-to-constant comparison prevents SQL Server parameterizing the query, so estimates are based on statistical information about LoanNum 2712. If the query is parameterized, SQL Server will use the average distribution of LoanNum values, which might result in a higher number of expected rows, and a change of plan choice.

Best Answer

Related Solutions

SQL Server Execution Plan Changes – Causes and Solutions

The Seek

The Scan

The combination

SQL Server Index – Why Use Non-Clustered Instead of Clustered

Related Question