Sql-server – How bad is query plan pollution (too many equivalent plans in cache)

cacheexecution-planparametersql-server-2008-r2

Question currently affects C#/.NET, DB access based on ADO.NET and SQL Server 2008 R2, but I think it applies to other databases as well.

I noticed some old modules of a system have non-optimal SQL queries, with multiple concatenated value strings instead of parameter placeholders. They do a polling on a table, like every 10 seconds, to get items added during the last few minutes, which generates a new query plan on every execution.

Their performance is not too bad, no SQL injection risk (no web/user forms), they're old and it would be a lot of work to change their queries to correct parameterization. I suggest to do this change, but there's debate that it would be a waste of time, with other things being more important.

Edit: The database is supposed to run with mostly parameterized queries (which all newer modules use), so I would like to avoid the "optimize for ad hoc" option. Partially parameterized queries create a plan anyway. Is there a downside when running in ad-hoc optimized mode, with mostly parameterized queries?

To me, it seems like these old modules take a huge portion of the database resources, although they are few. Even a single module of this kind would create thousands of query plans over time, while all newer modules together have less.

Is it importante to change these, or can I leave them in their state, with optimization/parameterized queries only in current/future modules?

SQL is like:

select ItemId, ItemName from Items 
where ItemType=3 and ItemCreator=1234 
and ItemDate >= '2013-11-23 12:30:00'

where the values vary and the date is a few minutes before now. In a few cases, the date has been changed to a parameter like "@startDate", to avoid format problems, but ItemType and ItemCreator values are still concatenated strings.

When monitoring query plans with DMV or Activity Monitor (Recent Expensive Queries – Plan Count column), I notice some of these queries have 8000+ equivalent query plans in cache:

select count(*), query_plan_hash 
from sys.dm_exec_query_stats 
group by query_plan_hash 
order by count(*) desc

then selecting the plan XML with a CROSS APPLY on sys.dm_exec_query_plan, with plan handles selected by query plan hash.

Edit/Temporary conclusion:
It seems like it is best to leave the very old applications as they are, even when creating tons of ad hoc queries. My greatest fear was that the flood of single use ad hoc queries would cause the good, multi-use parameterized and prepared query plans to be evicted from cache. This does not happen, because, when cleanup is done, ad hoc plans are evicted first, and others are rated by factors like complexicity, number of usages etc. So parameterized queries with a high usage rate will likely be kept, no matter how many ad hoc or partially parameterized plans flood in. Ad hoc optimization reduces plan size (actually, no real plans are stored at first use), but even more plans may be kept, with a similar memory usage (is this correct?). Even partially parameterized SQL (DateTime parameters to avoid local format troubles) will quickly get evicted if not used again, even when sent with sp_executesql, which forces parameterization and plan caching. Having a huge number (5000 to 8000+) equivalent ad hoc query plans is no good, but probably less harmful than having to dig through years old C#, C++, maybe even Visual Studio 6.0 code, to fix queries (nobody pays for that, and the stuff is still running without recognizable problems).

Best Answer

"How bad is it?" depends on the degree to which you are suffering now or could suffer with increased workload in the future.

One major point of suffering with plan cache pollution could be too many single use plans bloating your plan cache leading to inefficient cache usage.

Another point of suffering could be high compilations/second - so in an environment with a heavy workload and a lot of activity, there is a cost associated with compiling over and over.

You can see the impact of compilations/sec in perfmon (SQL Server Statistics:Compilations/sec). This can look like CPU pressure. To your performance/applications, this can look like increased query duration waiting for needless compiles each time it runs.

You can see the impact to the plan cache from the memory bloat by this query borrowed from Glenn Berry's Diagnostic scripts. How big is your SQLCP plan cache?

SELECT TOP(10) [type] AS [Memory Clerk Type], 
       SUM(pages_kb)/1024 AS [Memory Usage (MB)] 
FROM sys.dm_os_memory_clerks WITH (NOLOCK)
GROUP BY [type]  
ORDER BY SUM(pages_kb) DESC OPTION (RECOMPILE);

Also the query that was used in the question to identify the number of plans helps as well.

Is This Ever a Good Thing?

There are some cases where this could be good, but the situation is rare. Basically if you were suffering from parameter sniffing gone bad (nutshell: if the data can vary widely from execution to execution based on parameters, one compilation for one set of parameters ideal may yield an excellent query plan for that one query but poor for others.). My guess is that you likely wouldn't be dealing with that as bad as the implications from poor plan reuse.

What Can You Do About It?

  • Optimize For Ad Hoc Workloads can certainly help with the memory implications since only a stub of the plan is stored in cache at first execution, and the full plan isn't stored until it is executed a second time with the same plan.

  • Forced Parameterization could help here also. It can sometimes force parameterization to happen and help solve both the issue of cache bloat and the cost of having to recompile.

  • Fix The Queries Ideally, you shouldn't have to resort to these options, but instead can be more strict in your database development, encourage plan reuse, consider stored procedures for all of their benefits, and attempt to head off the problem that way. The ways to help fix this through forced parameterization or optimize for ad hoc are good to help, but the best solution is always aimed at the root cause.

There is an excellent resource here that talks about some of the dangers of plan cache pollution and some things you can do. I'd recommend a read here. It is written for SQL Server 2012, but the concepts and solutions apply.