Sql-server – Why VolumeQueueLength goes high for SQL Server in AWS

sql serverstorage

We have one SQL Server hosted on AWS EC2 machine. In the monitoring tab of one of the volume, we are seeing that VolumeQueueLength is periodically reaching to around 32 and then coming back.

So before troubleshooting that why it is reaching to 32, I want answer to few of my own questions:

  1. On what factors VolumeQueueLength depends? Does it depends on volume's max IOPS capacity, IOPS happening at that moment, Volume throughput capacity, current throughput happening or anything else as well?

  2. How do we correlate all these things and get some meaning info/conclusion out of it? Like are we bottleneck in terms of IOPS or in terms of Throughput? For instance I have below info:

At 2:40 UTC
VolQueueLength = 31.6
VolumeReadOps = 912k
VolumeWriteOps = 8.78k
VolumeReadBytes = 41.2k
VolumeWriteBytes = 15.5k
Vol Type: gp2
Vol IOPS: 6144

Any help / inputs would be greatly appreciated. Need to have strong foundation in storage as a DBA.

Best Answer

A disk queue length of 32 for a period of time on modern storage systems is most likely insignificant.

To correctly interpret disk queue length, you need to know how many physical disks are involved. If the volume is spanned across 32 physical disks, for example, then you have an average disk queue across the disks of 1. Since it's unlikely you can determine how many physical disks are really being used for the volume, it's difficult, if not impossible, to know what disk queue length number is going to be alarming until you can correlate it with SQL Server wait statistics showing that it is waiting on I/O. Also, a high disk queue length can simply mean that there's a high load, and as long as performance is acceptable, then you've got nothing to fix, but just note what the disk queue length is as a benchmark.

Generally you need to first look at SQL Server waits to determine if there is a bottleneck, and what the bottleneck is. Check out articles such as Resolving SQL Server Disk IO bottlenecks to get started on this subject. If you see I/O waits increasing, then you'll probably also see the disk queue length increasing.

So that answers question number 2. To answer question number 1, it's simply a matter of how many disk requests are waiting to be serviced. If you have faster disks, or more disks, you'll be able to throw a higher load at it before it a the disk queue starts to increase. There are so many factors--for example, if you have only one slow CPU, the application won't be able to process much data, and you'll have almost no disk queue, but performance will be horrible.