Sql-server – Understanding performance of highly unique WHERE clauses

azure-sql-databaseconditionoptimizationperformancesql server

I've been struggling to understand how to deal with a specific type of performance issue that shows up very often in this scenario: when you are wanting to apply multiple filters in a query, but you know that first filter will return a very small number of rows from a very large table.

For example, we have a 3rd-party heterogeneous table with 10M+ rows where the columns mean different things based on the "object type" in TYPEID. Here is an example query:

SELECT ID, NAME, INT109 FROM DATA WHERE TYPEID = 8301514 AND INT109 = 1

In this query, there are no covering indexes for the two filters, but there is an index on the 'TYPEID' column. What is perplexing is that even though there are only about 500 rows out of the 10M in the table with TYPEID = 8301514, this query sometimes takes many seconds to run.

If I simply remove the INT109 = 1 filter at the end, the query runs almost instantly:

SELECT ID, NAME, INT109 FROM DATA WHERE TYPEID = 8301514

It makes no sense to me that having fewer filters would make a query run much faster. Also the behavior seems to be inconsistent – the first query can run really fast too if it has been run multiple times, like something is being cached. It's difficult to do reliable experiments (this is in SQL Azure). Is this normal behavior? Is this something that can be caused by a bad execution plan (even though I'm not using parameters) or statistics that are out of date?

Best Answer

What's probably going on here is that at some point, you ran that first query with a value that had a lot more than 500 rows - so many that it figured it was better to scan 10M rows than do hundreds of thousands of lookups to get the Name and INT109 columns. And then that plan got cached, and reused when you provided a different value which would've benefited from a different plan. SQL made an assumption that it would be better to avoid recompiling than to evaluate each different value you could provide.

When you wrote a different query, it evaluated it fresh, and gave the best plan, although that might not be ideal for a different value.

The best way to fix this is to have an index on (TYPEID, INT109) INCLUDE (ID, Name), so that lookups are not needed, and the plan will seek regardless of the stats around TYPEID. Plus, this index is useful even if you leave out INT109.