There are a bunch of questions hidden in here.
Let's start with, why the logically equivalent >0
yields a scan instead of a seek.
The index is filtered on IS NOT NULL
, so SQL Server knows that no value is NULL
in this index. However, it does not know that all values are >0
because an IDENTITY
property is not a constraint. So, to return all rows >0
SQL Server has to find the first row that actually is >0
(using a seek operation) and then scan the rest of the rows from there on. If that scan is covering most of the table, spending the additional reads for the first seek might be more expensive than just scanning the full table. Both approaches will lead to the same result in all cases. Which one sql server picks is based on statistics.
So, why is the scan faster?
SQL Server has three physical join operators to choose from: The Nested Loops Join, the Hash Join and the Merge Join. The Merge Join is by far the fastest, however it requires that both inputs are sorted the same way. For more details on the join operators check out my JOIN
series here: http://sqlity.net/en/1146/a-join-a-day-introduction/
The seek at the bottom of the Nested Loop Join operator get's executed for each row of the top input. That is a lot of seeks and they can become costly quickly. SQL Server usually figures out where the so called "tipping point" lies at which a full table scan is cheaper then a repeated seek. However, that decision is based on statistics again and often off by a few hundred rows.
If the Merge join is faster and the data is already sorted, why does SQL Server not use it for both queries?
That is a good question... Usually bad decisions by the optimizer are cause by stale statistics (on any table involved in the query). However, the optimizer has limitations and dealing with advanced features like filtered indexes might force it out of its comfort zone. If updating statistics does not help we need to help the optimizer. There are several ways that can be done. You could flat out provide a MERGE JOIN
hint. For this query that might even be safe as the query will always have to look at all rows in the child table. However, in general, hints should be considered an absolute last resort as they prevent the optimizer from adapting to changing data. You could also try to rewrite the query in a way that gets the optimizer started on a different path leading to a better result. That is kind of comparable to hinting, only hidden - usually even worse a strategy. However, if you think back to pre 2005 times, it was very common to utilize query rewrites just to get the optimizer to end up with a better plan. The advantage of this approach is that the next version of the optimizer might recognize the rewritten query to be equivalent to the original so that this technique does not have longterm effects. (That could also be bad...)
An other approach altogether is to provide additional information to the optimizer. If you have complex queries the detail of the statistics might just not be fine-grained enough for the optimizer to make a good choice. In that case it often helps to create a filtered index or statistics object. In your case we are looking at a filtered index already. What might be missing is the information that all rows are actually >0
. So it might help to add both conditions to the index filter.
While this all does not give you a real solution, I hope it helps understanding what is going on here. To solve this, I would start with adding that second condition to the same index and see where that leads you.
On a different note: You mention a 50-50 comparison of two execution plans. I assume you are referring to the "Query cost (relative to batch)". While SQL Server displays this information in a very suggestive way, you can never rely on it. This number is helpful to sql server internally when comparing different execution plans for the same query. However it is absolutely useless when comparing two nonidentical queries. It is best to just ignore this value when talking about performance. Use information you can get back from STATISTICS IO
and STATISTICS TIME
instead.
UPDATE to address your questions:
1) To retrieve the rows for any query, SQL Server can always execute a scan. In the right circumstances a seek can be much faster. I was describing what SQL Server had to do to execute a seek on the audit table to get the rows >0. If SQL Server thinks that would result in more reads then a scan (as in this case), it is opting for the scan.
2)To make the merge join hint work you need to also tell the optimizer that there actually can't be a value <0
in the table. In your example you could do that like this:
ALTER TABLE sometable ADD CONSTRAINT sometable_id_upzero CHECK (id>0);
Because of the existing foreign key this constraint applies to both tables.
If you now execute
SELECT TOP 50000
a.id sometableid,
a.value,
b.id auditid,
b.auditvalue
FROM sometable a
INNER MERGE JOIN audit b
ON a.id = b.sometable_id;
SQL Server will use this execution plan:
That is the one we are aiming for. It is still using the filter_null_audits index, but allowing for the merge join now. Not sure why this constraint would change if the MERGE join could be used; it seems to be an optimizer limitation. Again, filtered indexes are a fairly advanced option for the optimizer and the implementation might not be complete yet.
3) The advice about not trusting the relative query cost is general in nature and did not imply that the numbers are actually wrong for this particular query. I can easily construct two queries where one shows a relative cost of 99% but the other one actually runs a hundred times slower. If you have an example at hand with correct relative costs, that just means you got lucky. :)
Hope that clarified things.
I assumed it would follow my hint, and maybe error out at execution time if I wound up with some bad data and the index was missing some needed values.
The query optimizer will only use a filtered index in a query plan if it can guarantee (within its reasoning framework) that all possible matches can be served from the index. This is by design, to avoid the sort of runtime error you describe.
Failure to results in a NESTED LOOPS JOIN from my non-clustered index against a clustered index Key Lookup, presumably to grab the parentId. INCLUDING parent ID eliminates this, and leaves me with a nice non-clustered index scan.
This is a known current limitation. Adding the filtered column(s) to the key or include list is the standard workaround, and a current best practice for all sorts of semi-related reasons.
The FORCE ORDER, MERGE JOIN is definitely needed though.
Be extremely careful using hints (directives) like this unless you fully understand all the consequences. FORCE ORDER
in particular is an extremely powerful and wide-ranging hint, with a number of non-obvious side-effects including the placement of aggregate operators, and the order of evaluation of subqueries and common table expressions.
For the most part, you should try to write queries that provide the query optimizer with enough good-quality information to make the right decisions without hints. The hinted plan may be 'optimal' today, but it may not remain so as the data volume and/or distribution changes over time.
Best Answer
No good reason. That is a covering index for that query.
Please vote for the feeback item here: https://feedback.azure.com/forums/908035-sql-server/suggestions/32896348-filtered-index-not-used-when-is-null-and-key-looku
And as a workaround include the
WhereColumn
in the filtered index: