SQL Server – Troubleshooting Page Life Expectancy Fluctuations

page-life-expectancysql serversql-server-2008-r2

For months now, the PLE on one of our servers has hovered around the 2 million seconds. If varied slightly from day to day but was pretty stable there.

This last weekend, we added 12 GB of virtual RAM and 1 virtual CPU core to the server. We did not change the maximum used RAM in SQL Server to match the new RAM nor did we allocate the new CPU core to SQL Server.

Since this was done, our PLE has fluctuated wildly, going between 50 and 4 million seconds every 10-30 minutes. The changes are not a slow rise or fall. The metrics go straight from very low to very high and vice versa in less than a minute.

Our overall wait times for the server are fine. Latches are normal. Buffer and plan cache sizes haven't changed. There doesn't seem to be any consistent pattern of a specific query or type of query draining the resources.

I've never seen PLE do this before. Can someone point me to what I may be missing or need to look deeper into?


Additional information from comments:

  • We are at 5 CPUs total but only using 3 (we were at 4 using 3).
  • Our total memory is 49GB and SQL's max is 28GB.
  • We are using VMWare with an x64 OS (Windows 2008).
  • There are 14 User databases on the server with the primary one being around 250GB.
  • Buffer Cache Hit Ratio has stayed around 98+% since this all began.
  • The server power plan is set to Balanced (not High Performance); however, that has not changed in several years. With that said, I completely agree it should be High Performance.
  • Neither the SQL Server Error nor Windows Event Logs are showing anything out of the ordinary.
  • The activity on the server has not changed in the last several weeks.
  • The server is NUMA aware. The MAXDOP is 4, with a cost threshold of 10.

Best Answer

We bumped the memory from 28GB (the original amount) to 40GB, leaving 8GB of memory for the OS and other processes. Immediately afterward everything returned to normal and has stayed stable. One of our DBAs speculated that SQL Server was confused about how much memory it really had available. I had checked the Total Server Memory both before and after and the numbers were consistent with I see in Server Properties but I find that assertion difficult to argue against