SQL Server – Troubleshooting CPU Issues

azure-vmsql serversql server 2014

We've inherited some SQL server 2014 databases, they are running on Azure IaaS. We're getting complaints about performance and trying to get a grip on it. We're starting to look at CPU performance, and run the following test (all tests are running on database machine itself, ie not via network):

SET NOCOUNT ON
DECLARE @Loops INT SET @Loops = 1 
DECLARE @RESULT INT

WHILE @Loops <= 2000000 
BEGIN
    IF COALESCE('123', '456') = '456'  
            SET @RESULT = 1 
    SET @Loops = @Loops + 1   
END

GO

On Production machine, 16 Virtual Processors (20 to 30% busy), we get the following result:

Completes in 3 minutes

On test machine, 4 Virtual Processors (also around 20 to 30% busy) we get the following:

Completes in 1 second

I've tried running a few benchmarks on the production machine, but they don't show any problems, and always show the production machine CPU at least twice as powerful as test machine.

Is there some SQL Server setting that might be configured wrongly? Any idea how to troubleshoot this issue?

Other Infrastructure Information:

Both databases have
- "max degree of parallelism"set to 1
- "cost threshold for parallelism" set to 5
- Azure VM storage is Premium SSD (P30)
Production (slow) is Standard_D14_v2
Test (fast) is Standard_D12_v2
Data, temp and logs are separated
Data is striped across 3 striped P20 or P30
temp on 4 striped P20 or P30
log on single P20 or P30

Note the 'problem' database is running always on Availability Group, we are wondering if that may be a factor here.

Best Answer

Rather than focusing on the hardware first, or the availability group even, it would be good to get a handle on what this "query" (test case) is waiting on.

To do that, you could run Paul Randal's script for Capturing wait statistics for a period of time, or sp_BlitzFirst from the open source First Responder's Kit. Or you could just poll sp_WhoIsActive periodically and see what the test query is waiting on in each sample.

Since I'm on 2016, I'm going to use sys.dm_exec_session_wait_stats because it's super easy to do, and it's 11:30 pm and I'm tired.

I ran your test query in SSMS against one of my AG boxes (finished in less than 2 seconds), grabbed the session ID, and then ran this:

SELECT *
FROM sys.dm_exec_session_wait_stats
WHERE session_id = @spid;

The results came out like this:

I ran this from a jump box, so that explains the small amount of ASYNC_NETWORK_IO. The other main wait there is SOS_SCHEDULER_YIELD, which makes sense - this was a CPU-bound task getting switched off the scheduler periodically (or maybe missing its quantum, this is a VM and there's other work going on as well including SSAS, AG stuff, production workload, scheduled jobs, etc). And it was only 141 ms during a 2000-ish ms query.

I ran the same test on my local instance, which is barely doing anything other than writing this answer (query finished in 1 second):

There were no significant waits at all.

The wait stats for your benchmark should tell the story in your situation.

Any answer we could give would likely just be speculation without the wait info (maybe there's other CPU load you're not aware of on the "slow server," or the server is experiencing thread exhaustion due to the AG or other queries, or there are "poison waits" like THREADPOOL or RESOURCE_SEMAPHORE, etc).

Best Answer

Related Solutions

Sql-server – CPU Time Slower in SQL Server 2014 than in SQL Server 2008

Sql-server – SQL Server 2014 Reporting Services issues

Related Question