Sql-server – find out current disk response time in MS SQL Server using DMOs

sql server

using sys.dm_io_virtual_file_stats DMV I get io_stall which return response time in ms, but it is an accumulated result. Using Activity Monitor Data File I/O Pane, in the Response Time(ms) column we get aggregated but instant results. Though this column points to a corresponding field in the DMV, the results widely differ. In Activity Monitor there are relatively small numbers. I want to know how can I get that same instant response time via script. Is it possible to get that information in SQL Server using tsql or should I use server monitoring tools?

Best Answer

What is Activity Monitor Doing

You're right, those values from sys.dm_io_virtual_file_stats accumulate the entire time the SQL Server service is running:

The counters are initialized to empty whenever the SQL Server (MSSQLSERVER) service is started.

Activity Monitor is just taking a snapshot of the results, waiting 10 seconds, taking another snapshot, and then showing you the difference. This is why the numbers are small.

It can be useful to view these diffs, or log them to a table on a schedule, in order to help you know what's "normal" and what's "slow" for your server. Some tools that can help you do this:

If your goal is to see the same numbers from Activity Monitor via T-SQL, I would go with one of these options.

Disk Response Time vs File Response Time

The sys.dm_io_virtual_file_stats dynamic management function returns these stalls (AKA latency or disk response time) on a per-file basis. If you're looking for a pattern of slow response times, or a way to get latency, at the disk / LUN level, you'll have to be careful to consider which disk each of these files is on.

To the direct point of your question, SQL Server is not a great source of information about disk-level I/O issues.

What is Slow?

You'll see varying rules of thumb online, here's a common one (from SQLPerformance.com):

If you ask SQLskills, we will tell you something along the lines of:

  • Excellent: < 1ms
  • Very good: < 5ms
  • Good: 5 – 10ms
  • Poor: 10 – 20ms
  • Bad: 20 – 100ms
  • Really bad: 100 – 500ms
  • OMG!: > 500ms

If you do a Bing search, you will find articles from Microsoft making the recommendations similar to:

  • Good: < 10ms
  • Okay: 10 – 20ms
  • Bad: 20 – 50ms

Take those values with a grain of salt though. What matters is that performance is good enough, not whether a number has a certain value. Make sure to treat the patient, not the monitor.

Outside of SQL Server

To get truly useful information about about disk I/O, you'll really want to go outside of SQL Server. Windows Performance Monitor (PerfMon) is a good way to see some of these stats in "real time." You can see them aggregated across the whole machine, or by disk. The counters you want are Avg. Disk sec/Read and Avg. Disk sec/Write, which are both part of the Physical disk object.

I've heard that xperf is also a good tool for investigating this. I haven't used it, but I'll refer you to an excellent Microsoft blog post that provides a lot of details on usage and analysis: Analyzing Storage Performance using the Windows Performance Analysis ToolKit (WPT)

Maybe It's Not the Disk

It's important to note that high disk response times are not necessarily due to a "slow disk." As the MS blog post above mentions, It could be SAN issues (you might have unplugged or low throughput cabling, downgraded to slower disks than expected on the SAN, the network might be overloaded - even by traffic from other servers and sources - causing slow response times, etc).

It could be that the workload is doing too much I/O, and the disk just can't keep up. This type of problem can potentially be solved with query and index tuning, rather than trying to improve or upgrade a disk.

It could even be high CPU or oversubscribed VMs, or many other things.