Sql-server – Query with Variable not estimating right

cardinality-estimatesexecution-planperformancesql server

I have a query with several joins that is not performing well. After a lot of triage, I decided to take the query apart and look at just the driving table. Below is an example of the query based on just the driving table.

DECLARE @VAR AS INT

SET @VAR = 11652862

SELECT Field1, Field2
FROM tablea
WHERE Field2 = @VAR

When I run the code as it is the actual execution plan comes back with 15k estimated rows and 1m actual rows. Yes, this is just a simple SELECT...FROM TABLE, but remember this is just part of a bigger query, so this estimate difference greatly compounds as we add more joins.

I know the problem is the @VAR because the optimizer does not know the value of the variable when it compiles the query into a plan so its coming back with the avg # of rows for the estimate for the field. If I add OPTION (RECOMPILE) it will force in the value when it compiles and come back with the correct # of rows.

Here is the kicker: This is not custom code that my company can edit. It is part of a third party app that I cannot edit. Updating stats will not help (I looked at the histogram and it is correctly listing the value for my variable). I already have a covering NC index on those two fields. The only option I can think of is to use a forced execution plan on this query. I have never been able to successfully do that, though. Anyone have any other ideas or a good How-To reference on forced execution plans?

Since it was asked in the comments, the create statement for the index is:

CREATE NONCLUSTERED INDEX idx ON tablea (Field2)

Best Answer

I have other ideas than a plan guide and I'll post them here for your consideration, but I cannot recommend actually implementing them. There are at least two different ways of bumping up row estimates from tablea. The first involves manually setting statistics and the second involves redirecting the third party app to use a view that you defined instead of accessing the tablea directly.

Both CREATE STATISTICS and UPDATE STATISTICS have a STATS_STREAM option. Here is what Books Online says about it:

Identified for informational purposes only. Not supported. Future compatibility is not guaranteed.

It's completely understandable if that turns you off from using it, but what it allows you to do is to transplant the statistics from one object to another. Suppose that you want to increase cardinality estimates from queries that use tablea by 50X. You could take the data from tablea, duplicate it 50 times, gather stats with FULLSCAN, then update the statistics on tablea using the STATS_STREAM value from the other table along with the NORECOMPUTE option. You probably want to change the ROWCOUNT value as well. This should have a pretty large effect on all queries that reference the table. You could run into problems with other queries, with the underlying data changing in the table, with statistics on indexes not getting updated, and so on. This isn't a good option.

Depending on how the third party application connects to SQL Server, you may be able to create a view with a different schema from tablea and to direct that part of the application to use in the view. The view should have all of the same columns as tablea but should have additional code which inflates the number of rows returned. I didn't try that hard to get this to work because it seems very impractical, but this approach was inspired by Adam Machanic's article about forcing parallel plans:

SELECT ta.*
FROM tablea ta
CROSS JOIN (
    SELECT t.t
    FROM
    ( 
    VALUES 
    (2), (1), (1), (1), (1), (1), (1), (1), (1), (1), 
    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1), 
    -- lots more values here ...
    (1), (1), (1), (1), (1), (1), (1), (1), (1), (1)
   ) t(t)
WHERE t % 2 = 0
) t
OPTION (QUERYTRACEON 8690);

I wasn't really satisfied with that query but didn't spend too much time on it. In that form it does a single scan of the table but the query optimizer overestimates the cardinality estimate by 100X. TF 8690 is there to prevent a pointless table spool, but this probably isn't appropriate in a view or in a larger query. If the same application inserts data into this table you'd need to deal with that somehow. This isn't a good option.

To reiterate, I don't think that you should attempt either of these options. It would be much better to talk to the vendor, or if that doesn't work, to suck it up and try again with a plan guide. I don't have any experience with those so I can't be of help.