Sql-server – Should I favor using a param or a scalar function result within a SELECT

sql server

I have a select statement in SQL Server that must compare a large set of rows against a single calculated date.

The calculation itself is negligible; just looking for the prior Monday. Given that the column this calculated date will be matched against is indexed, is there any difference whether I pass in the calculated date from application code as a parameter or use a scalar function to calculate the date?

Said another way: should I favor using a param or calling a scalar function, or are they basically equal in this scenario?

Best Answer

It sounds like a situation where you could potentially fall prey to - or benefit from - parameter sniffing. In a nutshell, SQL Server will use parameter/variable values when compiling an execution plan in order to determine optimal index usage (based on column statistics). Depending on the value that's passed in, you could get very different execution plans, and very poor performance if the "wrong" value is used later. (A nice article about it.).

I'd personally calculate the date within the procedure using a scalar function, unless there's a need for whatever is calling this procedure to specify the date itself. Then you could always use the scalar function in the procedure to fall back on a default value. As long as parameter sniffing isn't giving you surprising performance changes, it's mostly just a matter of application design.

Related Solutions

Sql-server – How to FLOOR(3) equal 2

SELECT
  Number,
  CAST(LOG(Number, 10) AS VARBINARY) AS LogAB,
  CAST(LOG10(Number) AS VARBINARY) AS LogTen,
  CAST(LOG(Number) / LOG(10) AS VARBINARY) AS LogOverLog
FROM (
  VALUES (1000)
) AS Tally (Number);

Returns

Number      LogAB                   LogTen                  LogOverLog
----------- ----------------------- ----------------------- ----------------------
1000        0x4007FFFFFFFFFFFF      0x4008000000000000      0x4007FFFFFFFFFFFF

0x4008000000000000 is exactly 3.

0x4007FFFFFFFFFFFF is 2.99999999999999955591079014994.

If you are looking for an efficient expression maybe a CASE expression with the 10 different cases would actually work out less CPU intensive than calculating logarithms (or possibly you could have nested case expressions to do a trinary search)

How to Avoid Using Variables in SQL Server WHERE Clause

Parameter sniffing is your friend almost all of the time and you should write your queries so that it can be used. Parameter sniffing helps building the plan for you using the parameter values available when the query is compiled. The dark side of parameter sniffing is when the values used when compiling the query is not optimal for the queries to come.

The query in a stored procedure is compiled when the stored procedure is executed, not when the query is executed so the values that SQL Server has to deal with here...

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END

is a known value for @endDate and an unknown value for @startDate. That will leave SQL Server to guessing on 30% of the rows returned for the filter on @startDate combined with whatever the statistics tells it for @endDate. If you have a big table with a lot of rows that could give you a scan operation where you would benefit most from a seek.

Your wrapper procedure solution makes sure that SQL Server sees the values when DateRangeProc is compiled so it can use known values for both @endDate and @startDate.

Both your dynamic queries leads to the same thing, the values are known at compile-time.

The one with a default null value is a bit special. The values known to SQL Server at compile-time is a known value for @endDate and null for @startDate. Using a null in a between will give you 0 rows but SQL Server always guess at 1 in those cases. That might be a good thing in this case but if you call the stored procedure with a large date interval where a scan would have been the best choice it may end up doing a bunch of seeks.

I left "Use the DATEADD() function directly" to the end of this answer because it is the one I would use and there is something strange with it as well.

First off, SQL Server does not call the function multiple times when it is used in the where clause. DATEADD is considered runtime constant.

And I would think that DATEADD is evaluated when the query is compiled so that you would get a good estimate on the number of rows returned. But it is not so in this case.
SQL Server estimates based on the value in the parameter regardless of what you do with DATEADD (tested on SQL Server 2012) so in your case the estimate will be the number of rows that is registered on @endDate. Why it does that I don't know but it has to do with the use of the datatype DATE. Shift to DATETIME in the stored procedure and the table and the estimate will be accurate, meaning that DATEADD is considered at compile time for DATETIME not for DATE.

So to summarize this rather lengthy answer I would recommend the wrapper procedure solution. It will always allow SQL Server to use the values provided when compiling the the query without the hassle of using dynamic SQL.

PS:

In comments you got two suggestions.

OPTION (OPTIMIZE FOR UNKNOWN) will give you an estimate of 9% of rows returned and OPTION (RECOMPILE) will make SQL Server see the parameter values since the query is recompiled every time.

Best Answer

Related Solutions

Sql-server – How to FLOOR(3) equal 2

How to Avoid Using Variables in SQL Server WHERE Clause

Related Question