Sql-server – How to avoid using variables in WHERE clause

sql serversql-server-2008

Given a (simplified) stored procedure such as this:

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END

If the Sale table is large the SELECT can take a long time to execute, apparently because the optimizer can't optimize due to the local variable. We tested running the SELECT part with variables then hard coded dates and the execution time went from ~9 minutes to ~1 second.

We have numerous stored procedures that query based on "fixed" date ranges (week, month, 8-week etc) so the input parameter is just @endDate and @startDate is calculated inside the procedure.

The question is, what is the best practice for avoiding variables in a WHERE clause so as not to compromise the optimizer?

The possibilities we came up with are shown below. Are any of these best practice, or is there another way?

Use a wrapper procedure to turn the variables into parameters.

Parameters don't affect the optimizer the same way local variables do.

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
   DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
   EXECUTE DateRangeProc @startDate, @endDate
END

CREATE PROCEDURE DateRangeProc(@startDate DATE, @endDate DATE)
AS
BEGIN
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END

Use parameterized dynamic SQL.

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  DECLARE @sql NVARCHAR(4000) = N'
    SELECT
      -- Stuff
    FROM Sale
    WHERE SaleDate BETWEEN @startDate AND @endDate
  '
  DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
  EXECUTE sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END

Use "hard-coded" dynamic SQL.

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  DECLARE @sql NVARCHAR(4000) = N'
    SELECT
      -- Stuff
    FROM Sale
    WHERE SaleDate BETWEEN @startDate AND @endDate
  '
  SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
  SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
  EXECUTE sp_executesql @sql
END

Use the DATEADD() function directly.

I'm not keen on this because calling functions in the WHERE also affects performance.

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN DATEADD(DAY, -6, @endDate) AND @endDate
END

Use an optional parameter.

I'm not sure if assigning to parameters would have the same problem as assigning to variables, so this might not be an option. I don't really like this solution but including it for completeness.

CREATE PROCEDURE WeeklyProc(@endDate DATE, @startDate DATE = NULL)
AS
BEGIN
  SET @startDate = DATEADD(DAY, -6, @endDate)
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END

— Update —

Thanks for suggestions and comments. After reading them I ran some timing tests with the various approaches. I'm adding the results here as a reference.

Run 1 is without a plan. Run 2 is immediately after Run 1 with exactly the same parameters so it will use the plan from run 1.

The NoProc times are for running the SELECT queries manually in SSMS outside a stored procedure.

TestProc1-7 are the queries from the original question.

TestProcA-B are based on the suggestion by Mikael Eriksson. The column in the database is a DATE so I tried passing the parameter as a DATETIME and running with implicit casting (testProcA) and explicit casting (testProcB).

TestProcC-D are based on the suggestion by Kenneth Fisher. We already use a date lookup table for other things, but we don't have one with a specific column for each period range. The variation I tried still uses BETWEEN but does it on the smaller lookup table and joins to the larger table. I'm going to investigate further as to whether we can use specific lookup tables, although our periods are fixed there are quite a few different ones.

    Total rows in Sale table: 136,424,366

                       Run 1 (ms)     Run 2 (ms)
    Procedure          CPU   Elapsed  CPU    Elapsed  Comment
    NoProc constants   6567  62199    2870   719      Manual query with constants
    NoProc variables   9314  62424    3993   998      Manual query with variables
    testProc1          6801  62919    2871   736      Hard coded range
    testProc2          8955  63190    3915   979      Parameter and variable range
    testProc3          8985  63152    3932   987      Wrapper procedure with parameter range
    testProc4          9142  63939    3931   977      Parameterized dynamic SQL
    testProc5          7269  62933    2933   728      Hard coded dynamic SQL
    testProc6          9266  63421    3915   984      Use DATEADD on DATE
    testProc7          2044  13950    1092  1087      Dummy parameter
    testProcA         12120  61493    5491  1875      Use DATEADD on DATETIME without CAST
    testProcB          8612  61949    3932   978      Use DATEADD on DATETIME with CAST
    testProcC          8861  61651    3917   993      Use lookup table, Sale first
    testProcD          8625  61740    3994  1031      Use lookup table, Sale last

Here's the test code.

------ SETUP ------

IF OBJECT_ID(N'testDimDate', N'U') IS NOT NULL DROP TABLE testDimDate
IF OBJECT_ID(N'testProc1', N'P') IS NOT NULL DROP PROCEDURE testProc1
IF OBJECT_ID(N'testProc2', N'P') IS NOT NULL DROP PROCEDURE testProc2
IF OBJECT_ID(N'testProc3', N'P') IS NOT NULL DROP PROCEDURE testProc3
IF OBJECT_ID(N'testProc3a', N'P') IS NOT NULL DROP PROCEDURE testProc3a
IF OBJECT_ID(N'testProc4', N'P') IS NOT NULL DROP PROCEDURE testProc4
IF OBJECT_ID(N'testProc5', N'P') IS NOT NULL DROP PROCEDURE testProc5
IF OBJECT_ID(N'testProc6', N'P') IS NOT NULL DROP PROCEDURE testProc6
IF OBJECT_ID(N'testProc7', N'P') IS NOT NULL DROP PROCEDURE testProc7
IF OBJECT_ID(N'testProcA', N'P') IS NOT NULL DROP PROCEDURE testProcA
IF OBJECT_ID(N'testProcB', N'P') IS NOT NULL DROP PROCEDURE testProcB
IF OBJECT_ID(N'testProcC', N'P') IS NOT NULL DROP PROCEDURE testProcC
IF OBJECT_ID(N'testProcD', N'P') IS NOT NULL DROP PROCEDURE testProcD
GO

CREATE TABLE testDimDate
(
   DateKey DATE NOT NULL,
   CONSTRAINT PK_DimDate_DateKey UNIQUE NONCLUSTERED (DateKey ASC)
)
GO

DECLARE @dateTimeStart DATETIME = '2000-01-01'
DECLARE @dateTimeEnd DATETIME = '2100-01-01'
;WITH CTE AS
(
   --Anchor member defined
   SELECT @dateTimeStart FullDate
   UNION ALL
   --Recursive member defined referencing CTE
   SELECT FullDate + 1 FROM CTE WHERE FullDate + 1 <= @dateTimeEnd
)
SELECT
   CAST(FullDate AS DATE) AS DateKey
INTO #DimDate
FROM CTE
OPTION (MAXRECURSION 0)

INSERT INTO testDimDate (DateKey)
SELECT DateKey FROM #DimDate ORDER BY DateKey ASC

DROP TABLE #DimDate
GO

-- Hard coded date range.
CREATE PROCEDURE testProc1 AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
END
GO

-- Parameter and variable date range.
CREATE PROCEDURE testProc2(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO

-- Parameter date range.
CREATE PROCEDURE testProc3a(@startDate DATE, @endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO

-- Wrapper procedure.
CREATE PROCEDURE testProc3(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   EXEC testProc3a @startDate, @endDate
END
GO

-- Parameterized dynamic SQL.
CREATE PROCEDURE testProc4(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate'
   DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
   EXEC sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END
GO

-- Hard coded dynamic SQL.
CREATE PROCEDURE testProc5(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN ''@startDate'' AND ''@endDate'''
   SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
   SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
   EXEC sp_executesql @sql
END
GO

-- Explicitly use DATEADD on a DATE.
CREATE PROCEDURE testProc6(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDate) AND @endDate
END
GO

-- Dummy parameter.
CREATE PROCEDURE testProc7(@endDate DATE, @startDate DATE = NULL) AS
BEGIN
   SET NOCOUNT ON
   SET @startDate = DATEADD(DAY, -1, @endDate)
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO

-- Explicitly use DATEADD on a DATETIME with implicit CAST for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcA(@endDateTime DATETIME) AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDateTime) AND @endDateTime
END
GO

-- Explicitly use DATEADD on a DATETIME but CAST to DATE for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcB(@endDateTime DATETIME) AS
BEGIN
   SET NOCOUNT ON
   SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN CAST(DATEADD(DAY, -1, @endDateTime) AS DATE) AND CAST(@endDateTime AS DATE)
END
GO

-- Use a date lookup table, Sale first.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcC(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   SELECT SUM(Value) FROM Sale J INNER JOIN testDimDate D ON D.DateKey = J.SaleDate WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO

-- Use a date lookup table, Sale last.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcD(@endDate DATE) AS
BEGIN
   SET NOCOUNT ON
   DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
   SELECT SUM(Value) FROM testDimDate D INNER JOIN Sale J ON J.SaleDate = D.DateKey WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO

------ TEST ------

SET STATISTICS TIME OFF

DECLARE @endDate DATE = '2012-12-10'
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)

DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

RAISERROR('Run 1: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF

RAISERROR('Run 2: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF

DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

RAISERROR('Run 1: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF

RAISERROR('Run 2: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF

DECLARE @sql NVARCHAR(4000)

DECLARE _cursor CURSOR LOCAL FAST_FORWARD FOR
   SELECT
      procedures.name,
      procedures.object_id
   FROM sys.procedures
   WHERE procedures.name LIKE 'testProc_'
   ORDER BY procedures.name ASC

OPEN _cursor

DECLARE @name SYSNAME
DECLARE @object_id INT

FETCH NEXT FROM _cursor INTO @name, @object_id
WHILE @@FETCH_STATUS = 0
BEGIN
   SET @sql = CASE (SELECT COUNT(*) FROM sys.parameters WHERE object_id = @object_id)
      WHEN 0 THEN @name
      WHEN 1 THEN @name + ' ''@endDate'''
      WHEN 2 THEN @name + ' ''@startDate'', ''@endDate'''
   END

   SET @sql = REPLACE(@sql, '@name', @name)
   SET @sql = REPLACE(@sql, '@startDate', CONVERT(NVARCHAR(10), @startDate, 126))
   SET @sql = REPLACE(@sql, '@endDate', CONVERT(NVARCHAR(10), @endDate, 126))

   DBCC FREEPROCCACHE WITH NO_INFOMSGS
   DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS

   RAISERROR('Run 1: %s', 0, 0, @sql) WITH NOWAIT
   SET STATISTICS TIME ON
   EXEC sp_executesql @sql
   SET STATISTICS TIME OFF

   RAISERROR('Run 2: %s', 0, 0, @sql) WITH NOWAIT
   SET STATISTICS TIME ON
   EXEC sp_executesql @sql
   SET STATISTICS TIME OFF

   FETCH NEXT FROM _cursor INTO @name, @object_id
END

CLOSE _cursor
DEALLOCATE _cursor

Best Answer

Parameter sniffing is your friend almost all of the time and you should write your queries so that it can be used. Parameter sniffing helps building the plan for you using the parameter values available when the query is compiled. The dark side of parameter sniffing is when the values used when compiling the query is not optimal for the queries to come.

The query in a stored procedure is compiled when the stored procedure is executed, not when the query is executed so the values that SQL Server has to deal with here...

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END

is a known value for @endDate and an unknown value for @startDate. That will leave SQL Server to guessing on 30% of the rows returned for the filter on @startDate combined with whatever the statistics tells it for @endDate. If you have a big table with a lot of rows that could give you a scan operation where you would benefit most from a seek.

Your wrapper procedure solution makes sure that SQL Server sees the values when DateRangeProc is compiled so it can use known values for both @endDate and @startDate.

Both your dynamic queries leads to the same thing, the values are known at compile-time.

The one with a default null value is a bit special. The values known to SQL Server at compile-time is a known value for @endDate and null for @startDate. Using a null in a between will give you 0 rows but SQL Server always guess at 1 in those cases. That might be a good thing in this case but if you call the stored procedure with a large date interval where a scan would have been the best choice it may end up doing a bunch of seeks.

I left "Use the DATEADD() function directly" to the end of this answer because it is the one I would use and there is something strange with it as well.

First off, SQL Server does not call the function multiple times when it is used in the where clause. DATEADD is considered runtime constant.

And I would think that DATEADD is evaluated when the query is compiled so that you would get a good estimate on the number of rows returned. But it is not so in this case.
SQL Server estimates based on the value in the parameter regardless of what you do with DATEADD (tested on SQL Server 2012) so in your case the estimate will be the number of rows that is registered on @endDate. Why it does that I don't know but it has to do with the use of the datatype DATE. Shift to DATETIME in the stored procedure and the table and the estimate will be accurate, meaning that DATEADD is considered at compile time for DATETIME not for DATE.

So to summarize this rather lengthy answer I would recommend the wrapper procedure solution. It will always allow SQL Server to use the values provided when compiling the the query without the hassle of using dynamic SQL.

PS:

In comments you got two suggestions.

OPTION (OPTIMIZE FOR UNKNOWN) will give you an estimate of 9% of rows returned and OPTION (RECOMPILE) will make SQL Server see the parameter values since the query is recompiled every time.