Sql-server – High CPU and plan cache instability

cpumemoryplan-cachesql server

In the past couple of months my SQL Server instance has been experiencing higher than usual CPU usage than before, at times hitting 100% and affecting an application.

There have been no recent changes to any SQL Server configurations or options in that time, the OS may have in terms of Patching but that is another teams area, and it was not operating like this before the past couple of months.When we have high sustained period of CPU use, I see mostly wait stats of CXPACKET, LCK_M_X, THREADPOOL, LATCH_EX, RESOURCE_SEMEPHOR_QUERY_COMPILE. CXPACKET for parallelism being by far the largest.

I like to run Brent Ozar’s sp_blitzcache and the one thing that stands out is that the plan cache seems to be cleared by something and 99% of the plans are created in the last hour. This says this could be memory pressure or plan cache instability.I’m not sure what to check of where to look.

I have thought of enabling LPIM which isn’t now but besides this not sure what to try and I would be grateful for any help.

Edit**
I had started to log sp_whoisactive into a table. I found that there was a long running process of the application that was running a very intensive denormalisation process on the database, that gradually built and built until it finally starte a stage when it rebuild indexes.

Obviously this is what killed server performance. We have made sure this does not run again.

However the business has become very twitchy about the appplication/server. On friday there was a period when again the CPU maxed out to 100% for about 10 mins, unlike before when it lasted longer, this time it stopped after the 15mins. From looking at the results of sp_whoisactive I can see before the CPU got maxed I started to see a high number of RESOURCE_SEMAPHORE_QUERY_COMPILE wait types, these carried on for about 10 mins. I kind of understand what this wait type is, but did this cause the high CPU for the period, as some kind of cumulative effect of threads waiting?

If it was, how can I fix this so that it won't happen again?
I don't have any scope to tune queries/IXs, would adding more memory help?

*****Edit 2 *****
Update on thisIt seems that the inital issue was another team who started to run a Denormalisation process on the database that was really intensive and just killed the server.
After they were stopped doing this I recieved very few RESOURCE_SEMAPHORE_QUERY_COMPILE wait types unlike before

The situation was much better than before, however CPU was still hitting 100% at times for longer than I wanted and I was still experiencing plan cache instability, it was continually being trimmed.

After coming across this webpage – https://social.msdn.microsoft.com/Forums/sqlserver/en-US/6b9118f3-b6c1-4f39-ae80-d149392656f9/sql-server-2016-memory-pressure-leads-to-the-plan-cache-clearing?forum=sqldatabaseengineI realised that something at the os level had changed and must have been needing more memory, causing memory pressure and then it taking memory resource away from sql server and the plan cache.

I lowered my instance max server memory setting and things have settled down much more, CPU overall is significantly lower and rarely hitting 100% and the plan cache is a nice size and never being significantly trimmed down.

Best Answer

This DBA.StackExchange answer gives a good place to start tracing the issue.

Specifically the linked Microsoft Docs - Plan Caching in SQL Server 2008 docs (albeit a little dated, but still useful) have a section called Plan stability-related reasons of batch recompilations that details all the reasons the plan cache can be cleared out (you should also read the previous section Causes of Recompilations for some useful info).

There are too many reasons to quote from the above article but the one section I find particularly interesting and might be the easiest place to start is the following:

The following operations flush the entire plan cache, and therefore, cause fresh compilations of batches that are submitted the first time afterwards: Detaching a database, Upgrading a database to SQL Server 2005, Upgrading a database to SQL Server 2008, Restoring a database, DBCC FREEPROCCACHE command, RECONFIGURE command, ALTER DATABASE ... MODIFY FILEGROUP command, Modifying a collation, using ALTER DATABASE … COLLATE command

In my experience, I've personally seen developers schedule commands such as DBCC FREEPROCCACHE to run routinely and cause similar issues. (Additionally I've seen overuse of the OPTION (RECOMPILE) and WITH RECOMPILE clauses all throughout a database causing similar issues as well.)

If you can eliminate the above checklist then you should follow the rest of the guide in the Plan stability-related reasons of batch recompilations docs, maybe looking next at how often you guys are running DDL queries that would cause plan cache recompilation, and then how often are you guys updating statistics across the board.

It's unfortunately going to be an elimination game most likely and won't be a trivial issue to solve, but best of luck.

Related Solutions

Sql-server – Will cleaning data cache help to resolve compiler memory pressure

Take a look at this reference on MSDN regarding sys.dm_os_wait_stats. Here is a quote from the above reference regarding RESOURCE_SEMAPHORE_QUERY_COMPILE:

Occurs when the number of concurrent query compilations reaches a throttling limit. High waits and wait times may indicate excessive compilations, recompiles, or uncachable plans.

This wait type, as quoted above, is from too many compilations happening at a time. What it sounds like, if you are consistently seeing this wait type (unlike going from a cold plan cache) then you have too high of an ad hoc workload. In other words, you may be experiencing the lack of plan reuse (a bunch of plans that aren't parameterized/prepared, or not using stored procedures and therefore a new ad hoc query comes in since there is no plan that can be reused it'll compile a new one).

There are a few ways to remedy this, if that is indeed the case. The real way to fix this is to ensure that prepared plans and stored procedures are being used. A brute force method is by setting the database to utilize forced parameterization. I would save the latter as a last resort, as there could be some downsides to forcing parameterization.

There are a few ways to keep an eye on how your plan reuse is looking. The easiest (in my opinion) would be to capture a few perfmon counters for the instance:

\SQLServer:SQL Statistics\Batch Requests/sec
\SQLServer:SQL Statistics\SQL Compilations/sec
\SQLServer:SQL Statistics\SQL Re-Compilations/sec

It's a pretty good rule of thumb that compilations per sec shouldn't exceed 10% of batch requests per second, and recompilations per second shouldn't exceed 10% of compilations per second. I typically go from there when troubleshooting plan cache reuse. Feel free to monitor these counters, and post the results in your question and I can help interpret.

And to answer your direct questions, no reducing index sizes or clearing the data cache will not alleviate your issues.

Sql-server – Plan cache size and reserved memory

The reason that the size_in_bytes field of the sys.dm_exec_cached_plans DMV, at least in terms of "Compiled Plans", is larger than the CachedPlanSize attribute of the QueryPlan node in the XML plan is because a Compiled Plan is not the same thing as a Query Plan. A Compiled Plan is comprised of multiple Memory Objects, the combined size of which equates to the size_in_bytes field. So, the description of "Number of bytes consumed by the cache object" that you found in the documentation is accurate; it's just that it is easy to misinterpret what is meant by "cache object" given the name of the DMV and that the term "plan" has multiple meanings.

A Compiled Plan is a container that holds various pieces of information related to the query batch (i.e. not just a single statement), one (or more) of those pieces being the query plan(s). Compiled Plans have a top-level Memory Object of MEMOBJ_COMPILE_ADHOC which is the row in sys.dm_os_memory_objects that is linked via the memory_object_address field in both DMVs. This Memory Object contains the symbol table, parameter collection, links to related objects, accessor cache, TDS metadata cache, and possibly some other items. Compiled Plans are shared amongst Sessions/Users that are executing the same batch with the same Session settings. However, some related objects are not shared between Sessions/Users.

Compiled Plans also have one or more dependent objects that can be found by passing the plan_handle (in sys.dm_exec_cached_plans) into the sys.dm_exec_cached_plan_dependent_objects DMF. There are two types of dependent objects: Executable Plan (Memory Object = MEMOBJ_EXECUTE) and Cursor (Memory Object = MEMOBJ_CURSOREXEC). There will be 0 or more Cursor objects, one per each cursor. There will also be one or more Executable Plan objects, one per each User executing that same batch, hence Executable Plans are not shared between Users. Executable Plans contain run-time parameter and local variable info, run-time state such as the currently executing statement, object ids for objects created at run-time (I assume this refers to Table Variables, Temporary Tables, Temporary Stored Procedures, etc), and possibly other items.

Each statement within a multi-statement batch is contained within a Compiled Statement (Memory Object = MEMOBJ_STATEMENT). The size of each Compiled Statement (i.e. pages_in_bytes) divided by 1024 should match the CachedPlanSize="xx" values of the <QueryPlan> nodes in the XML plan. Compiled Statements will often have one (possibly more?) associated runtime Query Plans (Memory Object = MEMOBJ_XSTMT). Finally, for each runtime Query Plan that is a query, there should be an associated Query Execution Context (Memory Object = MEMOBJ_QUERYEXECCNTXTFORSE).

With respect to Compiled Statements, single-statement batches do not have separate Compiled Statement (i.e. MEMOBJ_STATEMENT) or separate runtime Query Plan (i.e. MEMOBJ_XSTMT) objects. The value for each of those objects will be stored in the main Compiled Plan object (i.e. MEMOBJ_COMPILE_ADHOC), and in that case, the pages_in_bytes value for that main object divided by 1024 should match the CachedPlanSize size in the <QueryPlan> node of the XML plan. Those values will not equate, however, in multi-statement batches.

The size_in_bytes value can be derived by summing the entries in the sys.dm_os_memory_objects DMV (the items noted above in bold), all related by dm_os_memory_objects.page_allocator_address for that Compiled Plan. The trick to getting the correct value is to first get the memory_object_address from sys.dm_exec_cached_plans for a particular Compiled Plan, then use that to get the corresponding MEMOBJ_COMPILE_ADHOC row from sys.dm_os_memory_objects based on its memory_object_address field. Then, grab the page_allocator_address value from sys.dm_os_memory_objects for that row, and use it to grab all rows from sys.dm_os_memory_objects that have the same page_allocator_address value. (Please note that this technique does not work for the other Cached Object types: Parse Tree, Extended Proc, CLR Compiled Proc, and CLR Compiled Func.)

Using the memory_object_address value obtained from sys.dm_exec_cached_plans, you can see all of the components of the Compiled Plan via the following query:

DECLARE @CompiledPlanAddress VARBINARY(8) = 0x00000001DC4A4060;

SELECT obj.memory_object_address, obj.pages_in_bytes, obj.type
FROM   sys.dm_os_memory_objects obj
WHERE  obj.page_allocator_address = (
                               SELECT planobj.page_allocator_address
                               FROM   sys.dm_os_memory_objects planobj
                               WHERE  planobj.memory_object_address = @CompiledPlanAddress
                              )
ORDER BY obj.[type], obj.pages_in_bytes;

The query below lists all of the Compiled Plans in sys.dm_exec_cached_plans along with the Query Plan and statements for each batch. The query directly above is incorporated into the query below via XML as the MemoryObjects field:

SELECT cplan.bucketid,
       cplan.pool_id,
       cplan.refcounts,
       cplan.usecounts,
       cplan.size_in_bytes,
       cplan.memory_object_address,
       cplan.cacheobjtype,
       cplan.objtype,
       cplan.plan_handle,
       '---' AS [---],
       qrypln.[query_plan],
       sqltxt.[text],
       '---' AS [---],
       planobj.pages_in_bytes,
       planobj.pages_in_bytes / 1024 AS [BaseSingleStatementPlanKB],
       '===' AS [===],
       cplan.size_in_bytes AS [TotalPlanBytes],
       bytes.AllocatedBytes,
       (SELECT CONVERT(VARCHAR(30), obj.memory_object_address, 1)
               AS [memory_object_address], obj.pages_in_bytes, obj.[type]
               --,obj.page_size_in_bytes
        FROM   sys.dm_os_memory_objects obj
        WHERE  obj.page_allocator_address = planobj.page_allocator_address
        FOR XML RAW(N'object'), ROOT(N'memory_objects'), TYPE) AS [MemoryObjects]
FROM   sys.dm_exec_cached_plans cplan
OUTER APPLY sys.dm_exec_sql_text(cplan.[plan_handle]) sqltxt
OUTER APPLY sys.dm_exec_query_plan(cplan.[plan_handle]) qrypln
INNER JOIN sys.dm_os_memory_objects planobj
        ON planobj.memory_object_address = cplan.memory_object_address
OUTER APPLY (SELECT SUM(domo.[pages_in_bytes]) AS [AllocatedBytes]
             FROM   sys.dm_os_memory_objects domo
             WHERE  domo.page_allocator_address = planobj.page_allocator_address) bytes
WHERE  cplan.parent_plan_handle IS NULL
AND    cplan.cacheobjtype IN (N'Compiled Plan', N'Compiled Plan Stub')
--AND cplan.plan_handle = 0x06000D0031CD572910529CE001000000xxxxxxxx
ORDER BY cplan.objtype, cplan.plan_handle;

Please note that:

the TotalPlanBytes field is just a re-statement of the sys.dm_exec_cached_plans.size_in_bytes field,
the AllocatedBytes field is the SUM of the related memory objects that typically matches TotalPlanBytes (i.e. size_in_bytes)
the AllocatedBytes field will occasionally be greater than TotalPlanBytes (i.e. size_in_bytes) due to the memory consumption increasing during execution. This seems to happen mostly due to recompilation (which should be evident with the usecounts field showing 1)
the BaseSingleStatementPlanKB field should match the CachedPlanSize attribute of the QueryPlan node in the XML, but only when using a single query batch.
for batches with multiple queries, there should be rows marked as MEMOBJ_STATEMENT in sys.dm_os_memory_objects, one for each query. The pages_in_bytes field for these rows should match the individual <QueryPlan> nodes of the XML plan.

Resources:

What are the different cached objects in the plan cache?
1.0 Structure of the Plan Cache and Types of Cached Objects
2.0 Sql_Handle and Plan_Handle Explained
Documentation for sys.dm_exec_cached_plans
Documentation for sys.dm_os_memory_objects
Documentation for sys.dm_exec_sql_text
Documentation for sys.dm_exec_query_plan

Best Answer

Related Solutions

Sql-server – Will cleaning data cache help to resolve compiler memory pressure

Sql-server – Plan cache size and reserved memory

Related Question