Sql-server – SQL dumping all pages from buffer cache every few minutes

cachememorysql serversql-server-2012

I have a single SQL2012 SP4 node running several databases.

The server has 20GB of memory available, 14GB allocated to SQL (nothing else running on the box).

Every few minutes SQL dumps the whole of the buffer cache. Page life expectancy hits zero, the buffer cache descriptors show that there's nothing in the cache.

I had a look at the resource monitor notifications and the notifications are bouncing around from high/steady/low every few milliseconds:

RESOURCE_MEMPHYSICAL_HIGH
RESOURCE_MEM_STEADY
RESOURCE_MEMPHYSICAL_LOW

With timestamps that are several milliseconds apart. The PLE is essentially a saw-tooth pattern.

I've seen this happen before with SQL2012 SP1 and this question:

SQL Server 2012 Free Pages in Buffer Cache Not Being Used

Seems to be a similar issue, though I've already updated to SP4.

I've tried turning on LPIM for the service account and I've tried messing with the max memory setting. Lowering max memory seems to have caused the buffer cache to empty more frequently.

Any ideas of what to check next?

Server workload is literally nothing (I'm scrolling through lists of items in an ERP system and it gets to about 40-50MB before the cache just drops again).

It's interesting because I upgraded from SP1 to try and fix this – the cache there was getting to around 500MB. Since then I dropped the max memory setting to 14GB which seems to have made it worse.

I'm wondering if Windows is panicking and throwing incorrect notifications for memory pressure at SQL – it follows that the server with max memory set to unbounded seemed to run ok-ish but never fill the cache more than a few hundred MB – but now it barely gets to 50…

More Info: for those that asked

Number of cores: 4

Database Size: 80GB

Error log shows: A significant part of sql server process memory has been paged out. This may result in a performance degradation. Duration: 0 seconds. Working set (KB): 247928, committed (KB): 495656, memory utilization: 50%.

Results of running scripts from this link: https://www.sqlskills.com/blogs/jonathan/identifying-external-memory-pressure-with-dm_os_ring_buffers-and-ring_buffer_resource_monitor/

Results of memory pressure query

Not sure how to interpret these – it looks like there is both internal and external memory pressure at various times..

Even more info:

This is a Hyper-V guest sitting on a host with 96GB of total ram of which about half is allocated to the guests.

The symptoms seem similar to this:

SQL Server 2012 x64 – cannot safely allocate more than 50% RAM

However, when I allocated 14GB to SQL the symptoms kicked in immediately (barely 3GB of the server memory was committed)

Last night I bumped the guest memory to 32GB and the issue went away but I see only 14GB commit of total server memory (and the business that runs the DB is busy this morning and this is when they usually have their performance issues).

About 8-9GB of data in the cache at the moment, seems to be stable.

It seems to suggest that 20GB is enough for the workload on this box. I'm happy leaving it with 32GB for now but I'd really like to get to the bottom of this so I can better configure the VMs/SQL.

I'll keep digging and update if I find the answer!

Even more more info:

I didn't restart SQL after turning on LPIM (not realising that was a requirement) but I did leave this setting on and restart to upgrade the memory so now I'm not sure if the increase in memory or LPIM has alleviated the issues.

Will jump on tonight when the server is idle and check how it looks at 20GB again.

Even Further More More Info:

Currently the server is ticking along ok with 32GB allocated and we've not seen the problem since. If this crops up again I'll come back to this question and keep digging.

Currently remains a mystery but my guess is that I'm only masking the issues at the moment.

Best Answer

Although you seem to have solved the issue yourself, here is a summary of the relevant information surrounding the solution.

Server Memory Server Configuration Options

Microsoft write in their article Server Memory Server Configuration Options(Microsoft | SQL Docs) for the section Setting the memory options manually

(emphasis mine)

Also, setting a min_server_memory value is essential in a virtualized environment to ensure memory pressure from the underlying host does not attempt to deallocate memory from the buffer pool on a guest SQL Server virtual machine (VM) beyond what is needed for acceptable performance.

The section regarding Lock Pages in Memory (same document) has an equalling compelling paragraph which reads as:

(emphasis mine)

This Windows policy determines which accounts can use a process to keep data in physical memory, preventing the system from paging the data to virtual memory on disk. Locking pages in memory may keep the server responsive when paging memory to disk occurs. The Lock Pages in Memory option is set to ON in instances of SQL Server Standard edition and higher when the account with privileges to run sqlservr.exe has been granted the Windows Lock Pages in Memory (LPIM) user right.

The section LPIM goes on to explain that:

(emphasis mine)

Setting this option does not affect SQL Server dynamic memory management, allowing it to expand or shrink at the request of other memory clerks. When using the Lock Pages in Memory user right it is recommended to set an upper limit for max server memory as detailed above.

... and in an important comment that:

(emphasis mine)

Setting this option should only be used when necessary, namely if there are signs that sqlservr process is being paged out. In this case, error 17890 will be reported in the Errorlog, resembling the below example:

A significant part of sql server process memory has been paged out. 
This may result in a performance degradation. Duration: #### seconds. 
Working set (KB): ####, committed (KB): ####, memory utilization: ##%.  

Starting with SQL Server 2012 (11.x), trace flag 845 is not needed for Standard Edition to use Locked Pages.

Solution

Based on the above findings and your observations the solution to your problem would be to configure the following settings:

  1. min_server_memory (5-10 GB?) based on your comment:

    About 8-9GB of data in the cache at the moment, seems to be stable.

    ...and Microsoft's recommendation of setting a min_server_memory.

  2. max_server_memory (20-32 GB) based on your observation:

    It seems to suggest that 20GB is enough for the workload on this box. I'm happy leaving it with 32GB for now but I'd really like to get to the bottom of this so I can better configure the VMs/SQL.

    ...and Microsoft's recommendation of setting a max_server_memory.

  3. Lock Pages in Memory : Enabled for SQL Server Service Account.
    Based on the ERRORLOG entry of your SQL Server instance you mentioned and Microsoft's reference in the article.

    Setting this option should only be used when necessary, namely if there are signs that sqlservr process is being paged out.

Before You Continue...

(One of) the benefit(s) of having a virtualised environment is that resources can/should be shared, and possibly even over-commited. However, turning on Lock Pages In Memory (LPIM) could have a negative impact on your Hyper-V environment, if your hardware is hosting multiple instances. An over-commitment of RAM could exhaust other instances.

Before you consider switching all the levers, start off with settings 1. and 2. and if fine tuning these memory settings doesn't work, then consider turning on LPIM if you have enough hardware.