SQL Server – Handling Non-Optimal Cached Execution Plans

execution-planindexplan-cachesql server

Our application uses SQL Server 2014 and we got an issue related to the plan cache.

We have a parametrized query and its execution plan depends on parameter values. The server caches an execution plan which is not optimal in some cases and then uses it for all consequent queries.

Details:

We have a table consists of the following columns:

(
 [Revision] [bigint] IDENTITY(1,1) NOT NULL,
 [UserId] [uniqueidentifier] NOT NULL,
 ...A WHOLE LOT OF OTHER COLUMNS...
)

The meaning of those two columns is pretty clear, UserId is an Id of the user that the record belongs to, Revision is an auto-incrementing index of the record. Other columns are not important, but they exist and affect execution plans.

The table contains ~40.000.000 rows and ~200.000 distinct UserId values, so each user has 200 records in average. Rows are never updated, we use only INSERT and DELETE to modify data.

Our application executes the following query against this table:

SELECT * FROM SampleTable WHERE Revision > {someRevision} AND UserId = {someId}

The table has two indexes:

Clustered index: Revision asc
Non-Clustered index: UserId asc, Revision asc

When I execute this query manually, I see that the execution plan depends on the value of someRevision.

If it's relatively close to the current max value of Revision, the server uses Clustered Index Seek with Seek Predicate: Revision > someRevision
If it isn't close, the server uses Index Seek (NonClustered) + Key Lookup (Clustered) with Seek Predicate: UserId = someId AND Revision > someRevision.

Our application uses Linq-To-Sql and generates parametrized queries, they look like this:

exec sp_executesql N'SELECT * FROM [SampleTable] AS [t0]
WHERE ([t0].[Revision] > @p0) AND ([t0].[UserId] = @p1)',N'@p0 bigint,@p1 
uniqueidentifier',@p0=1234,@p1='bc38dd12-238c-41a2-9dea-bb12ce105e6d'

I used dm_exec_cached_plans, dm_exec_sql_text, dm_exec_query_plan and understood that the server put a single plan for this query into the cache. So, if the query with the corresponding value of Revision came first, the plan using Clustered Index Seek would be stored in the plan cache and then would be used for all the consequent queries.

It leads to an excessive number of logical reads (x10000) and unacceptable execution time for queries which should be executed using the second plan (Index Seek (NonClustered) + Key Lookup (Clustered)).

Also I noticed that the threshold where the server switches between plans (the tipping point) depends on statistics, if it's stale, the plan can be sub-optimal even regardless the cache, because the server incorrectly estimates the number of rows with Revision greater than the given one.

In addition, we have a large set of similar tables with similar use cases and all of them have the same issue.

What can I do to solve this issue?

I could try to use OPTION (RECOMPILE), which is not easy with Linq-To-Sql, but it also doesn't look really optimal in performance terms.

Also I could use sp_create_plan_guide or hack Linq-To-Sql even more and try to WITH (INDEX(...)) clause to force using the second plan, but as I said there is a lot of tables with the same core structure, so this way looks like a lot of manual work.

Generally, my questions:

Can SQL Server understand that the plan stored in the cache is not optimal for given parameters and don't use it?

Are there some best practices of handling parametrized queries if their optimal execution plans depend on parameters?

Best Answer

This is called parameter sniffing, and it's covered extensively in Erland Sommarskog's epic post, Slow in the App, Fast in SSMS.

I can't even begin to do justice to it here, but sample solutions include:

OPTION (RECOMPILE) - which causes increased CPU use for the plan compilation, plus loses historical metrics of the query execution, but can build a unique plan for each set of parameters (although it can still be a suboptimal plan in cases of cardinality estimation issues)
Optimizing for a specific value - if you know your data well, you can use an OPTIMIZE FOR hint so that a plan is always built for a specific parameter value, regardless of what the user passed in. This is like creating technical debt - if your data skew changes, you may have to revisit your code in order to get a better plan.
Using index hints - which are generally worse than optimizing for a specific value because not only are you bossing the query optimizer around, but if that index disappears, your query simply fails. SQL Server doesn't try to use an alternate index for your query.
Plan guide - but if anything whatsoever changes about your query, even a single letter, then the plan guide will no longer match.
Combination of query and index tuning - get the developers to avoid selecting * (all the fields), and just get the fields they truly need. Then, build a covering index to match, and you'll get a single query plan that works well for all parameters.

Head on over and tackle Erland's excellent article - not only will it pay dividends today, but it will continue to pay off over your career as you solve this problem again and again. The solution that works well for your query today is likely to be very different than the solution you use for another query tomorrow.

Related Solutions

Sql-server – Index not making execution faster, and in some cases is slowing down the query. Why is it so

Even though the index is suggested by the SQL Server, why does it slow things down by a significant difference?

Index suggestions are made by the query optimizer. If it comes across a logical selection from a table which is not well served by an existing index, it may add a "missing index" suggestion to its output. These suggestions are opportunistic; they are not based on a full analysis of the query, and do not take account of wider considerations. At best, they are an indication that more helpful indexing may be possible, and a skilled DBA should take a look.

The other thing to say about missing index suggestions is that they are based on the optimizer's costing model, and the optimizer estimates by how much the suggested index might reduce the estimated cost of the query. The key words here are "model" and "estimates". The query optimizer knows little about your hardware configuration or other system configuration options - its model is largely based on fixed numbers that happen to produce reasonable plan outcomes for most people on most systems most of the time. Aside from issues with the exact cost numbers used, the results are always estimates - and estimates can be wrong.

What is the Nested Loop join which is taking most of the time and how to improve its execution time?

There is little to be done to improve the performance of the cross join operation itself; nested loops is the only physical implementation possible for a cross join. The table spool on the inner side of the join is an optimization to avoid rescanning the inner side for each outer row. Whether this is a useful performance optimization depends on various factors, but in my tests the query is better off without it. Again, this is a consequence of using a cost model - my CPU and memory system likely has different performance characteristics than yours. There is no specific query hint to avoid the table spool, but there is an undocumented trace flag (8690) that you can use to test execution performance with and without the spool. If this were a real production system problem, the plan without the spool could be forced using a plan guide based on the plan produced with TF 8690 enabled. Using undocumented trace flags in production is not advised because the installation becomes technically unsupported and trace flags can have undesirable side-effects.

Is there something that I am doing wrong or have missed?

The main thing you are missing is that although the plan using the nonclustered index has a lower estimated cost according to the optimizer's model, it has a significant execution-time problem. If you look at the distribution of rows across threads in the plan using the Clustered Index, you will likely see a reasonably good distribution:

Scan plan

In the plan using the Nonclustered Index Seek, the work ends up being performed entirely by one thread:

Seek plan

This is a consequence of the way work is distributed among threads by parallel scan/seek operations. It is not always the case that a parallel scan will distribute work better than an index seek - but it does in this case. More complex plans might include repartitioning exchanges to redistribute work across threads. This plan has no such exchanges, so once rows are assigned to a thread, all related work is performed on that same thread. If you look at the work distribution for the other operators in the execution plan, you will see that all work is performed by the same thread as shown for the index seek.

There are no query hints to affect row distribution among threads, the important thing is to be aware of the possibility and to be able to read enough detail in the execution plan to determine when it is causing a problem.

With the default index (on primary key only) why does it take less time, and with the non clustered index present, for each row in the joining table, the joined table row should be found quicker, because join is on Name column on which the index has been created. This is reflected in the query execution plan and Index Seek cost is less when IndexA is active, but why still slower? Also what is in the Nested Loop left outer join that is causing the slowdown?

It should now be clear that the nonclustered index plan is potentially more efficient, as you would expect; it is just poor distribution of work across threads at execution time that accounts for the performance issue.

For the sake of completing the example and illustrating some of the things I have mentioned, one way to get a better work distribution is to use a temporary table to drive parallel execution:

SELECT
    val1,
    val2
INTO #Temp
FROM dbo.IndexTestTable AS ITT
WHERE Name = N'Name1';

SELECT 
    N'Name1',
    SUM(T.val1),
    SUM(T.val2),
    MIN(I2.Name),
    SUM(I2.val1),
    SUM(I2.val2)
FROM   #Temp AS T
CROSS JOIN IndexTestTable I2
WHERE
    I2.Name = 'Name1'
OPTION (FORCE ORDER, QUERYTRACEON 8690);

DROP TABLE #Temp;

This results in a plan that uses the more efficient index seeks, does not feature a table spool, and distributes work across threads well:

Optimal plan

On my system, this plan executes significantly faster than the Clustered Index Scan version.

If you're interested in learning more about the internals of parallel query execution, you might like to watch my PASS Summit 2013 session recording.

SQL Server – How to Read an Execution Plan

Why don't your input parameters match the type for the table? Why would you want to keep the wrong types there and perform any casts or conversions at all (whether implicit or explicit)? Why are you converting anything to FLOAT, of all things? To address specific questions:

My query says [low] is being converted from int to numeric but that doesn't seem to be what the cluster index seek details is showing, is it?

The convert of low is happening in the output, not in the seek predicate (the predicate is what is used to find matching rows and/or eliminate non-matching rows).

Is there a way to tell in this specific example how much "Better" it would be if I did the conversions Explicitly? I would think I could just do a "cast" in the query and insert a few numbers where the variables are, is that correct?

There's no way to make the execution plan show you how much better a different plan would be, except to generate that different plan and compare. You can use this comparison to document how much better it would be if the interface were correct (and two other ways would be to keep the interface but (a) perform explicit converts in the query - not of the column, but of the variables or (b) use local variables of the right type and assign them the values of the parameters). So you could show them 3 different ways to solve the problem, and show evidence that all 3 are better than the current version.

My recommendation is to fix the procedure the right way. First let's look at the actual types you care about:

USE master;
GO
SELECT t.name, c.max_length/CASE 
  WHEN t.name LIKE N'n[cvt]%' THEN 2 ELSE 1 END
FROM sys.all_columns AS c 
INNER JOIN sys.types AS t
ON c.system_type_id = t.system_type_id
AND c.system_type_id = t.user_type_id
WHERE EXISTS
(
  SELECT 1 FROM sys.all_objects AS o
    INNER JOIN sys.schemas AS s
    ON o.[schema_id] = s.[schema_id]
    WHERE o.[object_id] = c.[object_id]
    AND o.name = N'spt_values'
    AND s.name = N'dbo'
)
AND c.name IN (N'number',N'type');

Results:

number    int     4
type      nchar   3

So the interface to your stored procedure should be:

USE yourdb;
GO
ALTER PROCEDURE dbo.some_name
  @1 INT,
  @2 NCHAR(3),
  @3 NUMERIC(4, 0)
AS
BEGIN
  SET NOCOUNT ON;

  SELECT CONVERT([float], [low] / @3, 0) -- don't think you want float here
    FROM [master].[dbo].[spt_values]
    WHERE [number] = @1
    AND [type] = @2;
END
GO

Implicit conversions between varchar and nvarchar can be particularly bad (especially in the opposite scenario as yours - parameter is nvarchar and column is varchar), but there really is no reason to allow for a 8000-character parameter of any type when the longest string possible in the table is 3 characters...

Best Answer

Related Solutions

Sql-server – Index not making execution faster, and in some cases is slowing down the query. Why is it so

SQL Server – How to Read an Execution Plan

Related Question