Sql-server – Consistent drop in performance on SQL Server 2016 SP2

sql-server-2016

Recently upgraded from 2008R2 to 2016SP2. Have a slow process that I can't change. Application contacts the server back and forth constantly and runs tiny queries, including doing an update 30K+ times serially.

I've tried any suggested indexes, etc…to try to help since it's a sticking point for completing the upgrade to production. The new server is loaded with twice as many procs, double the memory, etc..

The interesting thing about the process is that in testing it, using the same process and parameters, every time the server is restarted, it runs fast, then each subsequent run is about 2 minutes slower. It steadies about about the 5th or 6th run, but it's this same pattern every time. I would expect it to get better as it stores the plans and reuses them, but it's consistently the same behavior. I know that the underlying process needs to be corrected, but I don't have any control over that atm, and I find the pattern very odd, since nothing else is running on the server.

Testing has consisted of running the process from the app several times after each change. Started in 2008 compatibility mode, reindexed and updated, stats, applied missing indexes based on cache info., updated to use current 2016 Query Optimizer and Cardinality Estimator, added Optimize for adhoc queries, moved to full 2016 compatibility mode. With each change, the same pattern ensues.

The baseline runs against a 2008DB and is consistently somewhere in the middle of the times I get for 2016. It is run from the same application and workstation.

VMs for the app server and DB server. No AGs yet. The queries are all trivial in time. There's tons, but the queries themselves aren't slowing down. Max memory around 64GBs (and nothing else is hitting this server right now),

*Update:
I'm sorting through some Extended Event logs now to see if I can pare down specifics as to what runs longer. Even sorting out the query hash = 0, each run executes 13602 executions (ad hoc sql statements + rpc completed), so it's a lengthy process. Was hoping for some insight as to why a SQL Server restart would have a 10 minute difference on the duration of the process since nothing else is running on this server. I would think a restart would make it slower the first few runs, but it's exactly the opposite. Even things like sp_setapp role are taking longer on subsequent runs.

Any ideas why I would have query hash plans all equal to "0" on queries over 20 secs? Is this telling me these are compilation timeout queries?

Best Answer

Solved: Not sure it will help anyone else, but just in case it will, what I found was the serial nature of the process (one command per connection) was causing the process to send 143,000+ sp_setapprole calls during the process. This was bloating the TokenAndPermUserStore cache by 1GB each time it ran. This is why a restart of the service saw it run quickly and subsequent runs got slower as it took longer to search through the security cache. Long term fix is, of course, to have the process rewritten correctly with aggregate inserts and updates, but as a temporary solution, I have a job monitoring the size of the cache and flushing it with DBCC FREESYSTEMCACHE ('TokenAndPermUserStore'). This will add some IO to other calls, but sometimes we're forced to do what we can with the code we're given.