Sql-server – how to get a better execution plan for empty sets

execution-planoptimizationsql server

I have a query I'm trying to optimize, something like this:

SELECT 1 FROM HugeView
WHERE (Col1 = 'a' AND @val = 1) OR (Col2 = 'a' AND @Val = 0)

(where Col1 and Col2 are on different tables)

If I hardcode the @Val value (1/0) – SQL Server knows to build an execution plan in which it only accesses the relevant table for either Col1 or Col2.

But when using a variable all the tables in HugeView are accessed.

Can you advise on a way of doing it in a way that would help SQL Server not access the unecessary table?

Limitations:

  • Can't use option (recompile)
  • Can't encapsulate this code in an SP
  • Can't change HugeView

Unfortunately all aforementioned limitations are part of the product's design and/or dev scope – nothing I can do about that – just work within what I have.

I do however know that @var is either 1 or 0, and can create and query (and join on) temp or "real" tables as I like.

Also notice I don't select from the tables of Col1 or Col2 – I only use them for filtering.

I tried creating and joining on empty tables, or top(@somevar) depending on @var's value – but didn't help.

UNION ALL won't help as in any case the optimizer doesn't know the value of @val. Also can't change the query – if I could then of course all would've been solved.

Best Answer

To reproduce your issue, I created three tables with 1000 rows. The view definition does left outer joins on the primary keys of the table which should allow for join elimination. Here's the code:

CREATE TABLE dbo.BASE_TABLE (ID BIGINT NOT NULL, PRIMARY KEY (ID));
INSERT INTO dbo.BASE_TABLE WITH (TABLOCK)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values;

CREATE TABLE dbo.COL1_TABLE (ID BIGINT NOT NULL, COL1 VARCHAR(1), PRIMARY KEY (ID));
INSERT INTO dbo.COL1_TABLE WITH (TABLOCK)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), 'A'
FROM master..spt_values;

CREATE TABLE dbo.COL2_TABLE (ID BIGINT NOT NULL, COL2 VARCHAR(1), PRIMARY KEY (ID));
INSERT INTO dbo.COL2_TABLE WITH (TABLOCK)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), 'A'
FROM master..spt_values;

GO

CREATE VIEW HugeView AS
SELECT b.ID AS B_ID, c1.COL1, c2.COL2
FROM dbo.BASE_TABLE b
LEFT OUTER JOIN dbo.COL1_TABLE c1 ON b.ID = c1.ID
LEFT OUTER JOIN dbo.COL2_TABLE c2 ON b.ID = c2.ID;

GO

The following query access all three tables when it shouldn't have to:

DECLARE @Val BIGINT = 1;
SELECT 1 FROM dbo.HugeView
WHERE (Col1 = 'A' AND @val = 1) OR (Col2 = 'A' AND @Val = 0);

Note the thickness of the arrows in the actual plan:

repro plan

If I add a RECOMPILE hint then I only get two tables in the actual plan as desired:

recompile fix

Probably the most straightforward fix is to use UNION ALL. Here's the estimated plan:

UNION ALL estimated

At first the plan might look bad because it references all three tables. However, the highlighted startup predicate filter is important:

filter

What that means is that branch of the query plan may not be executed depending on the value of the parameter. If I get the actual plan you can see that only one half of the plan was executed:

fix actual

dbfiddle link for everything in this post.

Note that depending on the complexity of the view the startup filter may not appear in the optimal place. However it should save you some amount of unnecessary work.