Sql-server – SQL Server 2008 R2 Performance Degradation – High Lazy writes/sec and free list stalls/sec

performancesql server

I am supporting a customer of ours who has an enterprise edition of our web application living on virtualized servers at their location. They have 3 web servers and 1 DB server. All 4 servers are virtualized with vCenter and have Windows Server 2008 R2 installed.

The DB server is running SQL Server 2008 R2 with 36 GB of RAM, and separate drives for the DB, logs, backups, etc. (7 virtual in total) on a SAS setup, though the backups occur onto a SSD drive.

The web servers are 4 GB RAM each and load balanced. We have a few processes that import data from third parties into our application for end-users, but recently one of these processes have bottle-necked on the server and caused issues. A process that should take hours takes days, and often times fails out, so I have to manually queue up the process in SQL and let it run.

After much investigation, I'm at a loss as to what the issue it. I've confirmed all of the Web Application settings are the same as our hosted environment, which houses customers much larger than this and our DB server is more robust, but the overall load onto the server is great.

Some metrics that I have included cause me to worry about the processes that are actually occurring on the server. I've included these below but I am specifically worried about the Lazy writes/sec and free list stalls/sec, as to me this is indicating that the process is overloading the server in memory and dumping the page to hard disk, which would drive down performance. Is this correct? Can anyone give me ideas on where to go from here? The customer definitely is not liking the idea of adding more resources to the DB server, so I want to be able to prove, definitively, where the issue is at.

Since this is in a virtualized environment, could it also be that the resources just happen to be shared, driving down the performance of the VM and causing this to be flushed to the drive? Any help would be greatly appreciated. My hardware experience has met its limits and I am not a DBA, so I am trying to wrap my head around what all actually goes on in the background with SQL Server.

Thanks!

-Buffer cache hit ratio           1363657/sec
-Buffer cache hit ratio base      1363687/sec
-Page lookups/sec                 28043473454/sec
-Free list stalls/sec             621/sec
-Free pages                       1438/sec
-Total pages                      3932160/sec
-Target pages                     3932160/sec
-Database pages                   3846600/sec
-Reserved pages                   0/sec
-Stolen pages                     84122/sec
-Lazy writes/sec                  77354/sec
-Readahead pages/sec              15305687/sec
-Page reads/sec                   16859120/sec
-Page writes/sec                  7751703/sec
-Checkpoint pages/sec             5408194/sec
-AWE lookup maps/sec              0/sec
-AWE stolen maps/sec              0/sec
-AWE write maps/sec               0/sec
-AWE unmap calls/sec              0/sec
-AWE unmap pages/sec              0/sec
-Page life expectancy             16434/sec

Best Answer

This sounds like VMware might be using the balloon driver to remove memory from the SQL Server VM.

Ensure the VM is configured with locked memory and a CPU reservation.

Typically, for a SQL Server, you want reserved memory to match the amount of memory assigned to the VM. You'd want to ensure SQL Server is configured with a valid max server memory setting.

Check "Architecting SQL Server on vSphere for some great best-practices from VMware.

Once you've disabled the balloon driver, if you keep seeing a high ratio of stolen pages, check to see what memory clerks are using more than 16MB memory with this query:

SELECT MemoryClerkName = domc.name
    , MemoryClerkType = domc.type
    , Pages = SUM(domc.pages_kb / (domc.page_size_in_bytes / 1024))
    , MB = SUM(domc.pages_kb * 1024) / 1048576
FROM sys.dm_os_memory_clerks domc
WHERE domc.pages_kb > 16384 /* clerks using more than 16MB */
GROUP BY domc.type
    , domc.name
ORDER BY SUM(domc.pages_kb) DESC;

The "Default / MEMORYCLERK\SQLBUFFERPOOL" should typically be the main consumer of memory.

This query will show how much memory SQL Server has allocated, how much it wants to be able to allocate, and how much physical memory is present in the machine:

SELECT TargetMB = dosi.committed_target_kb * 1024 / 1048576
    , CommittedMB = dosi.committed_kb
    , PhysicalMB = dosi.physical_memory_kb * 1024 / 1048576
FROM sys.dm_os_sys_info dosi

This will show you what Windows is reporting for memory state:

SELECT dosm.system_memory_state_desc
FROM sys.dm_os_sys_memory dosm

If there is lots of free memory in your system, you'll see "Available physical memory is high" reported. If you see "Available physical memory is low", you'd probably benefit from adding memory to the server.