Sql-server – how to get a better execution plan for empty sets

execution-planoptimizationsql server

I have a query I'm trying to optimize, something like this:

SELECT 1 FROM HugeView
WHERE (Col1 = 'a' AND @val = 1) OR (Col2 = 'a' AND @Val = 0)

(where Col1 and Col2 are on different tables)

If I hardcode the @Val value (1/0) – SQL Server knows to build an execution plan in which it only accesses the relevant table for either Col1 or Col2.

But when using a variable all the tables in HugeView are accessed.

Can you advise on a way of doing it in a way that would help SQL Server not access the unecessary table?

Limitations:

Can't use option (recompile)
Can't encapsulate this code in an SP
Can't change HugeView

Unfortunately all aforementioned limitations are part of the product's design and/or dev scope – nothing I can do about that – just work within what I have.

I do however know that @var is either 1 or 0, and can create and query (and join on) temp or "real" tables as I like.

Also notice I don't select from the tables of Col1 or Col2 – I only use them for filtering.

I tried creating and joining on empty tables, or top(@somevar) depending on @var's value – but didn't help.

UNION ALL won't help as in any case the optimizer doesn't know the value of @val. Also can't change the query – if I could then of course all would've been solved.

Best Answer

To reproduce your issue, I created three tables with 1000 rows. The view definition does left outer joins on the primary keys of the table which should allow for join elimination. Here's the code:

CREATE TABLE dbo.BASE_TABLE (ID BIGINT NOT NULL, PRIMARY KEY (ID));
INSERT INTO dbo.BASE_TABLE WITH (TABLOCK)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL))
FROM master..spt_values;

CREATE TABLE dbo.COL1_TABLE (ID BIGINT NOT NULL, COL1 VARCHAR(1), PRIMARY KEY (ID));
INSERT INTO dbo.COL1_TABLE WITH (TABLOCK)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), 'A'
FROM master..spt_values;

CREATE TABLE dbo.COL2_TABLE (ID BIGINT NOT NULL, COL2 VARCHAR(1), PRIMARY KEY (ID));
INSERT INTO dbo.COL2_TABLE WITH (TABLOCK)
SELECT TOP (1000) ROW_NUMBER() OVER (ORDER BY (SELECT NULL)), 'A'
FROM master..spt_values;

GO

CREATE VIEW HugeView AS
SELECT b.ID AS B_ID, c1.COL1, c2.COL2
FROM dbo.BASE_TABLE b
LEFT OUTER JOIN dbo.COL1_TABLE c1 ON b.ID = c1.ID
LEFT OUTER JOIN dbo.COL2_TABLE c2 ON b.ID = c2.ID;

GO

The following query access all three tables when it shouldn't have to:

DECLARE @Val BIGINT = 1;
SELECT 1 FROM dbo.HugeView
WHERE (Col1 = 'A' AND @val = 1) OR (Col2 = 'A' AND @Val = 0);

Note the thickness of the arrows in the actual plan:

If I add a RECOMPILE hint then I only get two tables in the actual plan as desired:

Probably the most straightforward fix is to use UNION ALL. Here's the estimated plan:

At first the plan might look bad because it references all three tables. However, the highlighted startup predicate filter is important:

What that means is that branch of the query plan may not be executed depending on the value of the parameter. If I get the actual plan you can see that only one half of the plan was executed:

dbfiddle link for everything in this post.

Note that depending on the complexity of the view the startup filter may not appear in the optimal place. However it should save you some amount of unnecessary work.

Related Solutions

Sql-server – Why would recompile query hint result in different plan for same adhoc statement after freeproccache

The query you posted contains variables.

SQL Server doesn't do variable sniffing so without OPTION (RECOMPILE) it will compile a general plan as it would for OPTIMIZE FOR UNKNOWN.

I don't really follow your question though. At one point you seem to be saying that the version without the hint is "much much faster" and then later you say the version with the hint is "much better". So which one is it?

Both are explicable however. If you find the version with the hint is better than this is because SQL Server can use statistics to estimate the number of rows that will be matched by the date predicate and choose an appropriate plan for that case.

If the version without the hint is better the statistics themselves may need updating. Perhaps when they were last updated there were few or no rows meeting that predicate and so SQL Server massively underestimates the number of rows that will be returned. See Statistics, row estimations and the ascending date column for more about this potential issue.

MySQL query ‘going away’ on executing INSERT ON DUPLICATE UPDATE statement with a 12524 character blob

This sounds like you have to increase the size of your MySQL Packets

According to the page 99 of "Understanding MySQL Internals" (ISBN 0-596-00957-7), here are paragraphs 1-3 explaining it:

MySQL network communication code was written under the assumption that queries are always reasonably short, and therefore can be sent to and processed by the server in one chunk, which is called a packet in MySQL terminology. The server allocates the memory for a temporary buffer to store the packet, and it requests enough to fit it entirely. This architecture requires a precaution to avoid having the server run out of memory---a cap on the size of the packet, which this option accomplishes.

The code of interest in relation to this option is found in sql/net_serv.cc. Take a look at my_net_read(), then follow the call to my_real_read() and pay particular attention to net_realloc().

This variable also limits the length of a result of many string functons. See sql/field.cc and sql/intem_strfunc.cc for details.

Since MySQL Packets can hold rows of data, larger items in the packet can cause a lot of packets to filter in-and-out to prevent whole chunks of related data from splitting during processing. This can be a silent killer of DB connections for no apparent reason. If fact, I wrote a post about how this can affect certain mysqldumps.

Try increasing the max_allowed_packet (256M) using the following command:

SET max_allowed_packet = 1024 * 1024 * 256;

Best Answer

Related Solutions

Sql-server – Why would recompile query hint result in different plan for same adhoc statement after freeproccache

MySQL query ‘going away’ on executing INSERT ON DUPLICATE UPDATE statement with a 12524 character blob

Related Question