I've been arguing with a DBA and a couple hardware guys about performance issues on our SQL server. Normally everything is fine, however over the past few weeks we have been having huge lag spikes in sql server. Its clear that SQL Server is waiting on disk I/O. But I keep getting told that it is beacuse SQL Server is asking for abnormally high I/O. Which isn't the case. I can see from what is running that there is nothing out of the normal, and all the DBA cares to look at is what is causing the blocking and so on, which is useless. For instance the major thing we see backing up is operation on the ASPState database, which we are using to manage the ASP Session State on the web servers. These operations are normally never seen on Sp_who2 active results because they occur so quickly. The database is in simple recovery mode and logging is miminal. However during these lag spikes we can see alot select and update operations on the database being blocked or waiting. I'm sure what is going on is that someone or some job is running something that is causing heavey disk usage on the raid arrays used for that databases log and data files. The problem is proving it, since no one wants to admit they are doing something that is killing our website.
My question is what performance counters or whatever can I log that will help show that SQL server is waiting on I/O, but not because its asking for more than normaly, instead beacuse the disk is to busy to respond to the requests from sql server as quickly as it normally would?
Best Answer
Have a look at the following perfmon counters:
Page lookups/sec
Page reads/sec
Readahead pages/sec
Full Scans/sec
Range Scans/sec
Skipped Ghosted Records/sec
Page IO latch waits
SQL Server driving a high number of IO requests would be corroborated with a high number scans, increase in page lookups and page reads and high page IO latch waits. Is worth trying a look at
sys.dm_exec_query_stats
for entries with high physical reads counts. They could quickly pinpoint the culprit.In general approaching the problem as a performance troubleshooting problem, following a methodlogy like Waits and Queues is the right approach. You DBA seems to be doing the right thing so you should listen to him.