Sql-server – SARGable WHERE clause for two date columns

indexperformancequery-performancesql server

I have what is, to me, an interesting question on SARGability. In this case, it's about using a predicate on the difference between two date columns. Here's the setup:

USE [tempdb]
SET NOCOUNT ON  

IF OBJECT_ID('tempdb..#sargme') IS NOT NULL
BEGIN
DROP TABLE #sargme
END

SELECT TOP 1000
IDENTITY (BIGINT, 1,1) AS ID,
CAST(DATEADD(DAY, [m].[severity] * -1, GETDATE()) AS DATE) AS [DateCol1],
CAST(DATEADD(DAY, [m].[severity], GETDATE()) AS DATE) AS [DateCol2]
INTO #sargme
FROM sys.[messages] AS [m]

ALTER TABLE [#sargme] ADD CONSTRAINT [pk_whatever] PRIMARY KEY CLUSTERED ([ID])
CREATE NONCLUSTERED INDEX [ix_dates] ON [#sargme] ([DateCol1], [DateCol2])

What I'll see pretty frequently, is something like this:

/*definitely not sargable*/
SELECT
    * ,
    DATEDIFF(DAY, [s].[DateCol1], [s].[DateCol2])
FROM
    [#sargme] AS [s]
WHERE
    DATEDIFF(DAY, [s].[DateCol1], [s].[DateCol2]) >= 48;

…which definitely isn't SARGable. It results in an index scan, reads all 1000 rows, no good. Estimated rows stink. You'd never put this in production.

No sir, I didn't like it.

It would be nice if we could materialize CTEs, because that would help us make this, well, more SARGable-er, technically speaking. But no, we get the same execution plan as up top.

/*would be nice if it were sargable*/
WITH    [x] AS ( SELECT
                * ,
                DATEDIFF(DAY, [s].[DateCol1], [s].[DateCol2]) AS [ddif]
               FROM
                [#sargme] AS [s])
     SELECT
        *
     FROM
        [x]
     WHERE
        [x].[ddif] >= 48;

And of course, since we are not using constants, this code changes nothing, and is not even half SARGable. No fun. Same execution plan.

/*not even half sargable*/
SELECT
    * ,
    DATEDIFF(DAY, [s].[DateCol1], [s].[DateCol2])
FROM
    [#sargme] AS [s]
WHERE
    [s].[DateCol2] >= DATEADD(DAY, 48, [s].[DateCol1])

If you're feeling lucky, and you're obeying all the ANSI SET options in your connection strings, you could add a computed column, and search on it…

ALTER TABLE [#sargme] ADD [ddiff] AS 
DATEDIFF(DAY, DateCol1, DateCol2) PERSISTED

CREATE NONCLUSTERED INDEX [ix_dates2] ON [#sargme] ([ddiff], [DateCol1], [DateCol2])

SELECT [s].[ID] ,
       [s].[DateCol1] ,
       [s].[DateCol2]
FROM [#sargme] AS [s]
WHERE [ddiff] >= 48

This will get you an index seek with three queries. The odd man out is where we add 48 days to DateCol1. The query with DATEDIFF in the WHERE clause, the CTE, and the final query with a predicate on the computed column all give you a much nicer plan with much nicer estimates, and all that.

I could live with this.

Which brings me to the question: in a single query, is there a SARGable way to perform this search?

No temp tables, no table variables, no altering the table structure, and no views.

I'm fine with self-joins, CTEs, subqueries, or multiple passes over the data. Can work with any version of SQL Server.

Avoiding the computed column is an artificial limitation because I'm more interested in a query solution than anything else.

Best Answer

Just adding this quickly so it exists as an answer (though I know it's not the answer you want).

An indexed computed column is usually the right solution for this type of problem.

It:

  • makes the predicate an indexable expression
  • allows automatic statistics to be created for better cardinality estimation
  • does not need to take any space in the base table

To be clear on that last point, the computed column is not required to be persisted in this case:

-- Note: not PERSISTED, metadata change only
ALTER TABLE #sargme
ADD DayDiff AS DATEDIFF(DAY, DateCol1, DateCol2);

-- Index the expression
CREATE NONCLUSTERED INDEX index_name
ON #sargme (DayDiff)
INCLUDE (DateCol1, DateCol2);

Now the query:

SELECT
    S.ID,
    S.DateCol1,
    S.DateCol2,
    DATEDIFF(DAY, S.DateCol1, S.DateCol2)
FROM
    #sargme AS S
WHERE
    DATEDIFF(DAY, S.DateCol1, S.DateCol2) >= 48;

...gives the following trivial plan:

Execution plan

As Martin Smith said, if you have connections using the wrong set options, you could create a regular column and maintain the computed value using triggers.

All this only really matters (code challenge aside) if there's a real problem to solve, of course, as Aaron says in his answer.

This is fun to think about, but I don't know any way to achieve what you want reasonably given the constraints in the question. It seems like any optimal solution would require a new data structure of some type; the closest we have being the 'function index' approximation provided by an index on a non-persisted computed column as above.