Sql-server – Disk I/O and PAGEIOLATCH_XX

We have been having CPU issues with one of our servers lately and while we have been looking into this we have also noticed queries running slowly with waits of PAGEIOLATCH_XX. In particular, a reindex job is seemingly always having this wait type.

In response, I have run a collect against sys.dm_io_virtual_file_stats and then broken this down into time chunk and worked out the average stall per operation. While there are spikes mostly, the disk seems to have a value of regularly under 20 ms. From what I remember, 20 ms is the recommended value(?).

Further to this I have run Glenn Barry's script:

select db_name(database_id) as DatabaseName, file_id
,io_stall_read_ms
,num_of_reads
,cast(io_stall_read_ms/(1.0+num_of_reads) as numeric(10,1)) as 'avg_read_stall_ms'
,io_stall_write_ms
,num_of_writes
,cast(io_stall_write_ms/(1.0+num_of_writes) as numeric(10,1)) as 'avg_write_stall_ms'
,io_stall_read_ms + io_stall_write_ms as io_stalls
,num_of_reads + num_of_writes as total_io
,cast((io_stall_read_ms+io_stall_write_ms)/(1.0+num_of_reads +
num_of_writes) as numeric(10,1)) as 'avg_io_stall_ms'
from sys.dm_io_virtual_file_stats(null,null) --where db_name(database_id) = 'tempdb'
order by [DatabaseName] desc'

Which calculates the average I/O stall also and this also confirms stalls less than 20 ms.

I have also looked in the following to see if any pending tasks are taking longer than recommended, but this isn't throwing up any pending I/O operations taking regularly longer than 20 ms.

SELECT db_name(database_id) as 'Database',
file_name(file_id) as 'File',
io_stall,
io_pending_ms_ticks
FROM sys.dm_io_virtual_file_stats(NULL, NULL) iovfs,
 sys.dm_io_pending_io_requests as iopior
WHERE iovfs.file_handle = iopior.io_handle

My question now is: If the issue is not disk related, why am I seeing lots of PAGEIOLATCH_XX waits? In particular, why is the reindex running extremely slowly with this wait type?

Could this be related to CPU pressure?

================================================================================

I just wanted to update the thread. After doing more analysis I have tracked down a particular proc that is causing significant reads. The proc is as follows:

ALTER PROCEDURE [dbo].[GetActiveSessionCount]
    @SessionCount   INTEGER OUTPUT
AS
SET NOCOUNT ON
BEGIN
    DECLARE @Error              INTEGER,
            @RowCount           INTEGER,
            @nExpireAfter       INTEGER
    SELECT  @nExpireAfter = ExpireSessionsAfter FROM KSYSTEM
    SELECT @Error = @@ERROR, @RowCount = @@ROWCOUNT
IF(1 <> @RowCount)
BEGIN
RAISERROR (50003, 15, 1, 'GetActiveSessionCount')
RETURN 50003
END
    IF (0 <> @Error)
    BEGIN
        RETURN @Error
    END
    SELECT  @SessionCount = COUNT(SessionID)
    FROM    KSESSION  WITH (NOLOCK)
    WHERE
    (
        (
            Expirable = 0
        )
        OR
        (
            Expirable = 1
            AND
            (   --SessionID IS NOT NULL)
                EXISTS (SELECT SessionID FROM KFILESAWAITINGCOMMIT fac WITH (NOLOCK) WHERE SessionID = fac.SessionID)
                OR
                (
                    LastAccessDateTime IS NOT NULL
                    AND GETDATE() <= (DATEADD(minute, @nExpireAfter, LastAccessDateTime))
                )
            )
        )
    )

SELECT @Error = @@ERROR
IF(@Error <> 0)
BEGIN
RETURN @Error
END
    RETURN 0
END

Using STATISTICS IO I can see the problem line is

SELECT SessionID FROM KFILESAWAITINGCOMMIT fac WITH (NOLOCK) WHERE SessionID = fac.SessionID

Looking at the execution plan it is doing a Clustered Index Scan. Now there is an non clustered index on that table already specifically for SessionID however it is not being used.

What I am finding in testing is if I run that SELECT by itself then it uses the non clustered index and performs well. But if I use a hint in the proc to force it to use the non clustered index, then it actually performs worse.

Can anyone explain?

Best Answer

PAGEIOLATCH_XX waits are logged by SQL Server when it is waiting for data to be read from the disk. Index maintenance is a notoriously intensive operation and because of this it should be performed at your quietest times to avoid any impact on production.

You mention you have queries that are causing the same waits. If this is at the same time as the index maintenance then that is not that odd but if it's happening at other times it could be down to memory pressure (not enough room in RAM to store pages so they need to be read from the disk again), large scans or it could even indicate there is a potential problem with your disks. More investigation is needed to rule each of these out.

Best Answer

Related Solutions

Sql-server – Oracle GoldenGate add trandata errors

Sql-server – High Disk I/0 Issues

Related Question