Sql-server – Conditional WHERE clause

conditionsql serversql-server-2005

I have, from what I can tell, a unique question. I would like to know if there's a way to construct a query in which the existence of the actual where clause itself is conditional.

I'm writing a stored procedure where I would like to pass in a filter (e.g. Col1=1 or Col2=3) and my query looks something like the following:

set @filter = 'Col1=1 or Col2=3' --assume these variables have been declared already
select * from @tbl where @filter --I need to use a table variable

Note: I tried using sp_executesql and sp_sqlexec but neither take a table variable. I would also like to avoid using an if-else statement. Please do not suggest either of these as your answer. 🙂

Best Answer

Dynamic SQL won't be able to see your table variable unless you also declare it and populate it within the same dynamic SQL scope. A #temp table will work fine, and I'm not sure why you "need" to use a table variable, but you can always do this:

SELECT * INTO #tbl FROM @tbl;

Anyway assuming you can change your process:

CREATE TABLE #tbl(Col1 INT, Col2 INT);

INSERT #tbl SELECT 1,4
  UNION ALL SELECT 2,3
  UNION ALL SELECT 5,8;

DECLARE @sql NVARCHAR(MAX);
DECLARE @filter NVARCHAR(255);

SET @sql = N'SELECT Col1, Col2 FROM #tbl';

SET @filter = N'Col1 = 1 OR Col2 = 3';

SET @sql = @sql + COALESCE(' WHERE ' + @filter, '');

EXEC sp_executesql @sql;

DROP TABLE #tbl;

Results:

Col1        Col2
----------- -----------
1           4
2           3

As an aside, you cannot do things like this:

select * from @tbl where @filter

You need to build such statements dynamically. SQL Server won't see @tbl or @filter as entities or where clauses.

Related Solutions

Sql-server – the most efficient way to get the minimum of multiple columns on SQL Server 2005

I tested the performance of all 3 methods, and here's what I found:

1 record: No noticeable difference
10 records: No noticeable difference
1,000 records: No noticeable difference
10,000 records: UNION subquery was a little slower. The CASE WHEN query is a little faster than the UNPIVOT one.
100,000 records: UNION subquery is significantly slower, but UNPIVOT query becomes a little faster than the CASE WHEN query
500,000 records: UNION subquery still significantly slower, but UNPIVOT becomes much faster than the CASE WHEN query

So the end results seems to be

With smaller record sets there doesn't seem to be enough of a difference to matter. Use whatever is easiest to read and maintain.
Once you start getting into larger record sets, the UNION ALL subquery begins to perform poorly compared to the other two methods.
The CASE statement performs the best up until a certain point (in my case, around 100k rows), and which point the UNPIVOT query becomes the best-performing query

The actual number at which one query becomes better than another will probably change as a result of your hardware, database schema, data, and current server load, so be sure to test with your own system if you're concerned about performance.

I also ran some tests using Mikael's answer; however, it was slower than all 3 of the other methods tried here for most recordset sizes. The only exception was it did better than a the UNION ALL query for very large recordset sizes. I like the fact it shows the column name in addition to the smallest value though.

I'm not a dba, so I may not have optimized my tests and missed something. I was testing with the actual live data, so that may have affected the results. I tried to account for that by running each query a few different times, but you never know. I would definitely be interested if someone wrote up a clean test of this and shared their results.

How to Avoid Using Variables in SQL Server WHERE Clause

Parameter sniffing is your friend almost all of the time and you should write your queries so that it can be used. Parameter sniffing helps building the plan for you using the parameter values available when the query is compiled. The dark side of parameter sniffing is when the values used when compiling the query is not optimal for the queries to come.

The query in a stored procedure is compiled when the stored procedure is executed, not when the query is executed so the values that SQL Server has to deal with here...

CREATE PROCEDURE WeeklyProc(@endDate DATE)
AS
BEGIN
  DECLARE @startDate DATE = DATEADD(DAY, -6, @endDate)
  SELECT
    -- Stuff
  FROM Sale
  WHERE SaleDate BETWEEN @startDate AND @endDate
END

is a known value for @endDate and an unknown value for @startDate. That will leave SQL Server to guessing on 30% of the rows returned for the filter on @startDate combined with whatever the statistics tells it for @endDate. If you have a big table with a lot of rows that could give you a scan operation where you would benefit most from a seek.

Your wrapper procedure solution makes sure that SQL Server sees the values when DateRangeProc is compiled so it can use known values for both @endDate and @startDate.

Both your dynamic queries leads to the same thing, the values are known at compile-time.

The one with a default null value is a bit special. The values known to SQL Server at compile-time is a known value for @endDate and null for @startDate. Using a null in a between will give you 0 rows but SQL Server always guess at 1 in those cases. That might be a good thing in this case but if you call the stored procedure with a large date interval where a scan would have been the best choice it may end up doing a bunch of seeks.

I left "Use the DATEADD() function directly" to the end of this answer because it is the one I would use and there is something strange with it as well.

First off, SQL Server does not call the function multiple times when it is used in the where clause. DATEADD is considered runtime constant.

And I would think that DATEADD is evaluated when the query is compiled so that you would get a good estimate on the number of rows returned. But it is not so in this case.
SQL Server estimates based on the value in the parameter regardless of what you do with DATEADD (tested on SQL Server 2012) so in your case the estimate will be the number of rows that is registered on @endDate. Why it does that I don't know but it has to do with the use of the datatype DATE. Shift to DATETIME in the stored procedure and the table and the estimate will be accurate, meaning that DATEADD is considered at compile time for DATETIME not for DATE.

So to summarize this rather lengthy answer I would recommend the wrapper procedure solution. It will always allow SQL Server to use the values provided when compiling the the query without the hassle of using dynamic SQL.

PS:

In comments you got two suggestions.

OPTION (OPTIMIZE FOR UNKNOWN) will give you an estimate of 9% of rows returned and OPTION (RECOMPILE) will make SQL Server see the parameter values since the query is recompiled every time.

Best Answer

Related Solutions

Sql-server – the most efficient way to get the minimum of multiple columns on SQL Server 2005

How to Avoid Using Variables in SQL Server WHERE Clause

Related Question