Sql-server – What would cause an elastic pool to exhibit high “Sql Server process core percent” metricsj, but low DTU / CPU

azureazure-sql-databasesql server

Backstory:

We have an azure sql elastic pool with several databases in it and around 750GB of data.
We've been having some sporadic outages which correlate with the metric "Sql Server process core percent" hitting a max of 100% for a duration.

Normally, the process core percent metric trends with the cpu percentage metric pretty closely. However, during these outage scenarios, the correlation is actually inverse. The CPU metric (max aggregation) bottoms out, when the process core percent (max aggregation) is peaking.

If I use average for the aggregation, the trend of them correlating is maintained.

TL;DR:
What would cause the "Sql Server process core" metric to max out independent of the CPU metric in an Azure Sql Elastic Pool?

Best Answer

So DTU actually combines three different things into one (CPU, Memory and Disk) and even with Disk you get a fixed Disk Size but also a max limit of IOPS and Throughput. I would recommend to check all these metrics out and see what all of these are showing you. Also look at connection failures other connection related metrics as well. For a time trend average stat is a good view but for a time period when you would like to know what happened at that minute then look at both average and max.

More details on DTU model can be found here.