Sql-server – SQL Server Performance Counters Relating to Disk Access and Log Files That Appear Concerning

performancesql server

I want to validate the optimal performance of our SQL Server and in doing so, I have gone through all the performance counters listed at the following link and create a trace log to analyze the data.

http://www.quest.com/backstage/images/promotions/SQLServer-Perfmonance-Poster.pdf

I have gone through three days worth of collected data and compared it to the recommended values listed in that chart, and though most of my values look good, there a just a few that are outside of the recommended values listed and I want to get some feedback from others on how their systems compare and if any of my values should raise a red flag.

As a point of clarification, our SQL Server is 2008, running on a virtual machine, with 4 cores and 8 GB of memory. The SQL VM connects directly, via iSCSI, to our SAN which has three separate disk groups (separate spindles) for the Database files, Log files and TempDB, and all are a RAID 1 configuration.

Here are the counters that I am concerned by:

  1. For the SQLServer:Databases\Log Flush Wait Time counter, it is staying fairly constant between 600 ms and 900 ms with an average of 750ms. According to the chart listed above, it states that this value should be around 0 ms. Why would my value be significantly higher?

  2. For SQLServer:Databases\Log Growths, I'm at 325 and for SQLServer:Databases\Log Truncations, I'm at 14200. I assume that this is since the start of SQL Server, correct? If so, the chart says that growths should be around 0. Is this something I should worry about? The last reboot of the server was a month ago and there are about 100 databases on the server.

  3. For the Memory\Pages Input/sec counter, the values fluctuate dramatically and have an average of 6 but have many spikes well upwards of 100 (some as high as 400). According to the chart, this value should be less than 10. Should I be concerned by these spikes?

  4. Finally for the PhysicalDisk\Avg. Disk sec/Write counters:

    • 4a. For the TempDB volume, I get dramatic spikes every few minutes that are upwards of 100 ms. The guide states that these values should be less than 20 ms. Should I be concerned by these tenth of a second spikes?

    • 4b. For the Log Files volume, it stays almost steady at 8 ms. Is it normal for the Log Files to have consistent writes with no break?

    • 4c. Regarding these disk counters, in Windows Disk Management > Properties > Policies, the disks don’t have the “Enable write caching on the device” checked nor the “Turn off Windows write-cache buffer flushing on the device” checked. Should these be enabled for a SAN that has a battery backed cache? Could this be the cause of 4a and 4b? If so, can they enabled during production hours without interrupting data flow?

Any input in regards to what others see on their production systems or if any of these values seem abnormal are appreciated.

Best Answer

SQLServer:Databases\Log Flush Wait Time counter, it is staying fairly constant between 600 ms and 900 ms with an average of 750ms

750ms wait time per flush is huge. It means every transaction statement must wait in average 750ms to commit. If you don't use explicit transactions for writes it means each write statement (INSERT/UPDATE/DELETE) must wait 750ms to complete. However this does not corroborate with the other sttaement 'PhysicalDisk\Avg. Disk sec/Write ... For the Log Files volume, it stays almost steady at 8 ms.'. One explanation is that you have log growths (you do, the counters say so) which skew the log flush wait times but that would only explain averages over long periods.

SQLServer:Databases\Log Growths, I'm at 325

Figure out which database is growing and resize the log appropriately. You should not occur growth events in production, they are extremely expensive. the counter has instances per database.

PhysicalDisk\Avg. Disk sec/Write ... For the Log Files volume, it stays almost steady at 8 ms. Is it normal for the Log Files to have consistent writes with no break?

Yes. A transaction cannot commit unless the log was written to disk. As long as you have transactions, you will have log disk writes.

PhysicalDisk\Avg. Disk sec/Write counters: For the TempDB volume, I get dramatic spikes every few minutes that are upwards of 100 ms.

It would be good to know why. Can you correlate it with a specific activity that causes the spike? Can you correlated it with database checkpoints occurring?

Memory\Pages Input/sec counter, the values fluctuate dramatically and have an average of 6 but have many spikes well upwards of 100 (some as high as 400).

So you're paging. Why? Insufficient memory? Any other offending process on the server host? Virtualized overcommitted environment? Only you can tell. Does swapping IO overlap with any of the IOs above (same physical path, ie. same disks)? Then some of the SQL timers may be influenced by this paging spikes.