Given a (simplified) stored procedure such as this:
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
END
If the Sale
table is large the SELECT
can take a long time to execute, apparently because the optimizer can't optimize due to the local variable. We tested running the SELECT
part with variables then hard coded dates and the execution time went from ~9 minutes to ~1 second.
We have numerous stored procedures that query based on "fixed" date ranges (week, month, 8-week etc) so the input parameter is just @endDate and @startDate is calculated inside the procedure.
The question is, what is the best practice for avoiding variables in a WHERE clause so as not to compromise the optimizer?
The possibilities we came up with are shown below. Are any of these best practice, or is there another way?
Use a wrapper procedure to turn the variables into parameters.
Parameters don't affect the optimizer the same way local variables do.
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
EXECUTE DateRangeProc @startDate, @endDate
END
CREATE PROCEDURE DateRangeProc(@startDate DATE, @endDate DATE)
AS
BEGIN
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
END
Use parameterized dynamic SQL.
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
DECLARE @sql NVARCHAR(4000) = N'
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
'
DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
EXECUTE sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END
Use "hard-coded" dynamic SQL.
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
DECLARE @sql NVARCHAR(4000) = N'
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
'
SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
EXECUTE sp_executesql @sql
END
Use the DATEADD()
function directly.
I'm not keen on this because calling functions in the WHERE also affects performance.
CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN DATEADD(DAY, -6, @endDate) AND @endDate
END
Use an optional parameter.
I'm not sure if assigning to parameters would have the same problem as assigning to variables, so this might not be an option. I don't really like this solution but including it for completeness.
CREATE PROCEDURE WeeklyProc(@endDate DATE, @startDate DATE = NULL)
AS
BEGIN
SET @startDate = DATEADD(DAY, -6, @endDate)
SELECT
-- Stuff
FROM Sale
WHERE SaleDate BETWEEN @startDate AND @endDate
END
— Update —
Thanks for suggestions and comments. After reading them I ran some timing tests with the various approaches. I'm adding the results here as a reference.
Run 1 is without a plan. Run 2 is immediately after Run 1 with exactly the same parameters so it will use the plan from run 1.
The NoProc times are for running the SELECT queries manually in SSMS outside a stored procedure.
TestProc1-7 are the queries from the original question.
TestProcA-B are based on the suggestion by Mikael Eriksson. The column in the database is a DATE so I tried passing the parameter as a DATETIME and running with implicit casting (testProcA) and explicit casting (testProcB).
TestProcC-D are based on the suggestion by Kenneth Fisher. We already use a date lookup table for other things, but we don't have one with a specific column for each period range. The variation I tried still uses BETWEEN but does it on the smaller lookup table and joins to the larger table. I'm going to investigate further as to whether we can use specific lookup tables, although our periods are fixed there are quite a few different ones.
Total rows in Sale table: 136,424,366 Run 1 (ms) Run 2 (ms) Procedure CPU Elapsed CPU Elapsed Comment NoProc constants 6567 62199 2870 719 Manual query with constants NoProc variables 9314 62424 3993 998 Manual query with variables testProc1 6801 62919 2871 736 Hard coded range testProc2 8955 63190 3915 979 Parameter and variable range testProc3 8985 63152 3932 987 Wrapper procedure with parameter range testProc4 9142 63939 3931 977 Parameterized dynamic SQL testProc5 7269 62933 2933 728 Hard coded dynamic SQL testProc6 9266 63421 3915 984 Use DATEADD on DATE testProc7 2044 13950 1092 1087 Dummy parameter testProcA 12120 61493 5491 1875 Use DATEADD on DATETIME without CAST testProcB 8612 61949 3932 978 Use DATEADD on DATETIME with CAST testProcC 8861 61651 3917 993 Use lookup table, Sale first testProcD 8625 61740 3994 1031 Use lookup table, Sale last
Here's the test code.
------ SETUP ------
IF OBJECT_ID(N'testDimDate', N'U') IS NOT NULL DROP TABLE testDimDate
IF OBJECT_ID(N'testProc1', N'P') IS NOT NULL DROP PROCEDURE testProc1
IF OBJECT_ID(N'testProc2', N'P') IS NOT NULL DROP PROCEDURE testProc2
IF OBJECT_ID(N'testProc3', N'P') IS NOT NULL DROP PROCEDURE testProc3
IF OBJECT_ID(N'testProc3a', N'P') IS NOT NULL DROP PROCEDURE testProc3a
IF OBJECT_ID(N'testProc4', N'P') IS NOT NULL DROP PROCEDURE testProc4
IF OBJECT_ID(N'testProc5', N'P') IS NOT NULL DROP PROCEDURE testProc5
IF OBJECT_ID(N'testProc6', N'P') IS NOT NULL DROP PROCEDURE testProc6
IF OBJECT_ID(N'testProc7', N'P') IS NOT NULL DROP PROCEDURE testProc7
IF OBJECT_ID(N'testProcA', N'P') IS NOT NULL DROP PROCEDURE testProcA
IF OBJECT_ID(N'testProcB', N'P') IS NOT NULL DROP PROCEDURE testProcB
IF OBJECT_ID(N'testProcC', N'P') IS NOT NULL DROP PROCEDURE testProcC
IF OBJECT_ID(N'testProcD', N'P') IS NOT NULL DROP PROCEDURE testProcD
GO
CREATE TABLE testDimDate
(
DateKey DATE NOT NULL,
CONSTRAINT PK_DimDate_DateKey UNIQUE NONCLUSTERED (DateKey ASC)
)
GO
DECLARE @dateTimeStart DATETIME = '2000-01-01'
DECLARE @dateTimeEnd DATETIME = '2100-01-01'
;WITH CTE AS
(
--Anchor member defined
SELECT @dateTimeStart FullDate
UNION ALL
--Recursive member defined referencing CTE
SELECT FullDate + 1 FROM CTE WHERE FullDate + 1 <= @dateTimeEnd
)
SELECT
CAST(FullDate AS DATE) AS DateKey
INTO #DimDate
FROM CTE
OPTION (MAXRECURSION 0)
INSERT INTO testDimDate (DateKey)
SELECT DateKey FROM #DimDate ORDER BY DateKey ASC
DROP TABLE #DimDate
GO
-- Hard coded date range.
CREATE PROCEDURE testProc1 AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
END
GO
-- Parameter and variable date range.
CREATE PROCEDURE testProc2(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO
-- Parameter date range.
CREATE PROCEDURE testProc3a(@startDate DATE, @endDate DATE) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO
-- Wrapper procedure.
CREATE PROCEDURE testProc3(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
EXEC testProc3a @startDate, @endDate
END
GO
-- Parameterized dynamic SQL.
CREATE PROCEDURE testProc4(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate'
DECLARE @param NVARCHAR(4000) = N'@startDate DATE, @endDate DATE'
EXEC sp_executesql @sql, @param, @startDate = @startDate, @endDate = @endDate
END
GO
-- Hard coded dynamic SQL.
CREATE PROCEDURE testProc5(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
DECLARE @sql NVARCHAR(4000) = N'SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN ''@startDate'' AND ''@endDate'''
SET @sql = REPLACE(@sql, '@startDate', CONVERT(NCHAR(10), @startDate, 126))
SET @sql = REPLACE(@sql, '@endDate', CONVERT(NCHAR(10), @endDate, 126))
EXEC sp_executesql @sql
END
GO
-- Explicitly use DATEADD on a DATE.
CREATE PROCEDURE testProc6(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDate) AND @endDate
END
GO
-- Dummy parameter.
CREATE PROCEDURE testProc7(@endDate DATE, @startDate DATE = NULL) AS
BEGIN
SET NOCOUNT ON
SET @startDate = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
END
GO
-- Explicitly use DATEADD on a DATETIME with implicit CAST for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcA(@endDateTime DATETIME) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN DATEADD(DAY, -1, @endDateTime) AND @endDateTime
END
GO
-- Explicitly use DATEADD on a DATETIME but CAST to DATE for comparison with SaleDate.
-- Based on the answer from Mikael Eriksson.
CREATE PROCEDURE testProcB(@endDateTime DATETIME) AS
BEGIN
SET NOCOUNT ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN CAST(DATEADD(DAY, -1, @endDateTime) AS DATE) AND CAST(@endDateTime AS DATE)
END
GO
-- Use a date lookup table, Sale first.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcC(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM Sale J INNER JOIN testDimDate D ON D.DateKey = J.SaleDate WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO
-- Use a date lookup table, Sale last.
-- Based on the answer from Kenneth Fisher.
CREATE PROCEDURE testProcD(@endDate DATE) AS
BEGIN
SET NOCOUNT ON
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
SELECT SUM(Value) FROM testDimDate D INNER JOIN Sale J ON J.SaleDate = D.DateKey WHERE D.DateKey BETWEEN @startDate AND @endDate
END
GO
------ TEST ------
SET STATISTICS TIME OFF
DECLARE @endDate DATE = '2012-12-10'
DECLARE @startDate DATE = DATEADD(DAY, -1, @endDate)
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
RAISERROR('Run 1: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF
RAISERROR('Run 2: NoProc with constants', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN '2012-12-09' AND '2012-12-10'
SET STATISTICS TIME OFF
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
RAISERROR('Run 1: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF
RAISERROR('Run 2: NoProc with variables', 0, 0) WITH NOWAIT
SET STATISTICS TIME ON
SELECT SUM(Value) FROM Sale WHERE SaleDate BETWEEN @startDate AND @endDate
SET STATISTICS TIME OFF
DECLARE @sql NVARCHAR(4000)
DECLARE _cursor CURSOR LOCAL FAST_FORWARD FOR
SELECT
procedures.name,
procedures.object_id
FROM sys.procedures
WHERE procedures.name LIKE 'testProc_'
ORDER BY procedures.name ASC
OPEN _cursor
DECLARE @name SYSNAME
DECLARE @object_id INT
FETCH NEXT FROM _cursor INTO @name, @object_id
WHILE @@FETCH_STATUS = 0
BEGIN
SET @sql = CASE (SELECT COUNT(*) FROM sys.parameters WHERE object_id = @object_id)
WHEN 0 THEN @name
WHEN 1 THEN @name + ' ''@endDate'''
WHEN 2 THEN @name + ' ''@startDate'', ''@endDate'''
END
SET @sql = REPLACE(@sql, '@name', @name)
SET @sql = REPLACE(@sql, '@startDate', CONVERT(NVARCHAR(10), @startDate, 126))
SET @sql = REPLACE(@sql, '@endDate', CONVERT(NVARCHAR(10), @endDate, 126))
DBCC FREEPROCCACHE WITH NO_INFOMSGS
DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS
RAISERROR('Run 1: %s', 0, 0, @sql) WITH NOWAIT
SET STATISTICS TIME ON
EXEC sp_executesql @sql
SET STATISTICS TIME OFF
RAISERROR('Run 2: %s', 0, 0, @sql) WITH NOWAIT
SET STATISTICS TIME ON
EXEC sp_executesql @sql
SET STATISTICS TIME OFF
FETCH NEXT FROM _cursor INTO @name, @object_id
END
CLOSE _cursor
DEALLOCATE _cursor
Best Answer
Parameter sniffing is your friend almost all of the time and you should write your queries so that it can be used. Parameter sniffing helps building the plan for you using the parameter values available when the query is compiled. The dark side of parameter sniffing is when the values used when compiling the query is not optimal for the queries to come.
The query in a stored procedure is compiled when the stored procedure is executed, not when the query is executed so the values that SQL Server has to deal with here...
is a known value for
@endDate
and an unknown value for@startDate
. That will leave SQL Server to guessing on 30% of the rows returned for the filter on@startDate
combined with whatever the statistics tells it for@endDate
. If you have a big table with a lot of rows that could give you a scan operation where you would benefit most from a seek.Your wrapper procedure solution makes sure that SQL Server sees the values when
DateRangeProc
is compiled so it can use known values for both@endDate
and@startDate
.Both your dynamic queries leads to the same thing, the values are known at compile-time.
The one with a default null value is a bit special. The values known to SQL Server at compile-time is a known value for
@endDate
andnull
for@startDate
. Using anull
in a between will give you 0 rows but SQL Server always guess at 1 in those cases. That might be a good thing in this case but if you call the stored procedure with a large date interval where a scan would have been the best choice it may end up doing a bunch of seeks.I left "Use the DATEADD() function directly" to the end of this answer because it is the one I would use and there is something strange with it as well.
First off, SQL Server does not call the function multiple times when it is used in the where clause. DATEADD is considered runtime constant.
And I would think that
DATEADD
is evaluated when the query is compiled so that you would get a good estimate on the number of rows returned. But it is not so in this case.SQL Server estimates based on the value in the parameter regardless of what you do with
DATEADD
(tested on SQL Server 2012) so in your case the estimate will be the number of rows that is registered on@endDate
. Why it does that I don't know but it has to do with the use of the datatypeDATE
. Shift toDATETIME
in the stored procedure and the table and the estimate will be accurate, meaning thatDATEADD
is considered at compile time forDATETIME
not forDATE
.So to summarize this rather lengthy answer I would recommend the wrapper procedure solution. It will always allow SQL Server to use the values provided when compiling the the query without the hassle of using dynamic SQL.
PS:
In comments you got two suggestions.
OPTION (OPTIMIZE FOR UNKNOWN)
will give you an estimate of 9% of rows returned andOPTION (RECOMPILE)
will make SQL Server see the parameter values since the query is recompiled every time.