We have this problem in our production environment.
Microsoft SQL Server 2008 R2 (SP1) – 10.50.2500.0 (X64) – Enterprise Edition (64-bit) on Windows NT 6.1 (Build 7601: Service Pack 1).
SQL Server is dropping all (almost 100% of) the old execution plans and recreating them everyday overnight (from 11:00 PM to 8:00 AM). This was even happening when the 'auto update stats' was at disabled state. We've turned the 'auto update stats' on for the last 2-3 weeks. But it's still happening.
We don’t really know what triggers this re-generation of plans but we are sure we don’t do it manually.
The only thing that really coincides with the timing of the plans being regenerated is a DB maintenance job that we have: the daily index reorganizing (when fragmentation is 5-30%), and the daily index rebuilding (when fragmentation is more than 30%) job. Usually this daily maintenance job only does reorganizing (as the index fragmentation is never more than 30% on daily basis).
Impact:
These newly created plans make some UDF calls/query calls (that are called from UI/web pages) take way longer (minutes as opposed to less than 1 second), and so the sessions just get piled up taking the CPU close to 90%.
The problem goes away the moment those stuck sessions are forcefully deleted (on the DB side), and 1) when all the corresponding execution plans are cleared manually (for queries) or 2) when the UDFs are altered (for functions). Any new plans created by SQL server from that moment work just perfect throughout the day until it ends up having the same issue next morning. Also, this behavior is not 100% consistent, we're not really seeing it each and every morning. But there have been periods of time where we've been seeing it consistently for 4-5 days in a row.
The problem happens on business mornings, that's when UI/web pages are accessed more intensively, it seems.
Does anyone have a clue what is causing this and how to solve this problem? Any help would be much appreciated.
Best Answer
Well I have some ideas which could cause this behavior.
optimize for ad hoc workloads
, which will just save a plan stub and compile it if it's needed. This will reduce the load of your plancache, which will lower the chance of an plancache flushing. You can enable it usingsp_configure 'optimize for ad hoc workloads',1; reconfigure
. This can be done if you have enabled theadvanced options
usingsp_configure 'show advanced options',1; reconfigure
.Just beside all of this possibilities, it may be useful to check the log files for some changes to the options
affinity mask
,affinity I/O mask
and their x64 partners. Another thing can be a change of theMAXDOP
option of your instance. Please check the logs for them too. They will need to flush the plancache too.Last but not least, you can still run a serverside trace (just setting it up using the profiler, start it, stop it and using sql command to start it again on the serverside). Beside that
perfmon
is your friend. It can watch and monitor your performance values for a time. Maybe you can see parallels in the pressure with certain actions on your server which may cause those flush.Hopefully this will help you, even if the answer come a bit later.