Sql-server – Table with clustered Index consuming max memory causing low PLE alerts

clustered-indexindex-tuningperformanceperformance-tuningsql-server-2008-r2

For one of my server with below configuration i am continuously receiving alerts for low PLE which ranges between 20- 400 and an average of 90-100 in a day:

Server: SQL server 2008R2

Databases : System and 2 user DB's with max size of DB being 54 GB

RAM: 16 GB

Max server system memory: 12.24 GB

Once i got a hint for memory pressure i used RAMMAP to find memory usage and found private process using 13.8 GB indicating a further doubt on lower memory pressure:

I analysed and found that one of the User DB A was using 8 GB RAM as per the blog from Paul Randal:Additionally i analysed for a period of time and saw tempdb using at an avg of 3-4 GB of RAM

Then with further investigation i went and found that in that database A there was a table having a clustered index consuming on average 6-6.5 GB of the memory.( Fragmentation is not an issue as its been checked)

Now from here i am unsure how to proceed? should i go and create the missing indexes or can drop the unused? or there is something else i need to dig in before making any conclusion.

Please suggest!

Best Answer

First and foremost thing, SQL Server in picture is SQL Server 2008 R2 RTM which is not supported by Microsoft in any way please apply SQL Server 2008 R2 SP3 to at least get extended support

I analyzed and found that one of the User DB A was using 8 GB RAM

I don't think there is any issue because of this, this is totally normal looking at this you cannot draw any conclusion.

A there was a table having a clustered index consuming on average 6-6.5 GB of the memory

I guess this table is biggest table of database which is consuming most part of buffer pool. I cannot upfront say about why table is holding so much memory so again you should first apply SP3 to rule out any possibility of leaks and then move with troubleshooting.

For one of my server with below configuration i am continuously receiving alerts for low PLE which ranges between 20- 400 and an average of 90-100 in a day:

For a system with 12 G RAM PLE should be around 900. But since you say its always around 300-400 which makes me thing SQL Server has to work hard and move pages frequently out of memory to disk because of memory pressure. To confirm memory pressure you have to rely on other counters not just PLE you can open perfmon and add following counters. It would be best Create a Data Collector Set to Monitor Performance Counters

SQLServer:memory Manager--Target Server Memory: This is amount of memory SQL Server is trying to acquire.
SQLServer:memory Manager--Total Server memory This is current memory SQL Server has acquired.

( Ideally Target value should be less than or equal to Total)
Page reads/sec – Number of physical database page reads that are issued per second. This statistic displays the total number of physical page reads across all databases. Because physical I/O is expensive, you may be able to minimize the cost, either by using a larger data cache, intelligent indexes, and more efficient queries, or by changing the database design
Free Pages – Total number of pages on all free lists (free lists track all of the pages in the buffer pool that are not currently allocate to a data page, and are therefore available for usage immediately). Undoubtedly this value should be high
Page Life Expectancy – Number of seconds a page will stay in the buffer pool without references> if you have NUMA system analyze PLE for each node as mentioned in this article
Free List Stalls/sec – Number of requests per second that had to wait for a free page. Ideally stalls should be as zero or as minimum as possible
SQLServer:Memory Manager--Memory Grants Pending: If you see memory grants pending in buffer pool your server is facing SQL Server memory crunch and increasing memory would be a good idea. For memory grants please read this article: If you see a non zero value of memory grant pending with Low PLE and High free List stalls you definitely have a memory pressure and should consider providing more RAM.

Please note: the value for above counters should be collected for at least 3-4 hours and when load on system is relatively very high. If possible before collecting the values please flush buffer cache , although I am not saying you to do it I know for Prod system it would not be possible. If you would produce the values I could analyze it for you.

I have no idea about workload of the system and just giving you option how to proceed. You should also analyze costly queries. Look for one that involves hash joins and sorts.

A bad query with does lots of joins and scans on large table has all capacity to bring counters like PLE,memory grants, free list stalls to value which make syou believe its memory pressure, ya it is but the cause is poor query and missing indexes and incorrect join. You should also consider this.

Edit:

From The output of performance data collector as requested for above counters

enter image description here

Below are some notable points

Target server memory and total server memory remained constant and there value were equal throughout the data collector process( from 10 AM in morning to 3 PM in afternoon). which points to fact that SQL Server database engine was happy with its current memory requirement
Memory grants pending was always zero.
Free List stalls/Sec was always zero. Which means no request had to wait for free pages
There was considerable decrease in PLE from 2:15 PM to 3:18 PM and at the same time pages read/sec were very high which makes me believe some query/process started after 2:00 PM which required too much pages in memory and hence causing flurry of activity in Buffer pool.

You need to find out what query is running after 2:00 PM and it was still running so it makes me think there is process.job starting at 2:00 PM. This process/job/query is trying to read too many pages which is causing lazywriter to flush too many pages because query is requesting space for such pages

It can be that such query is missing index, can be that its creating a bad plan due to skewed statistics, it can be that query needs to be written to get/read only subset of data not whole data.

it would be incorrect to say that its a memory pressure, its only that particular process is creating scenario which seems like memory pressure.

Related Solutions

Sql-server – SQL Server – Anyone use SUMA, trace flag 8048, or trace flag 8015

This is an awesome post.

To answer your final question, I'd speculate that your answer is "yes".

That said, I probably would have pursued soft numa before resorting to the trace flags. I think you are right about the numa node allocation and that's could be at the root of your problem. Via soft numa, you could scale out the requests, depending on your count of numa nodes (4?) - to 4, if that's the correct number, and then assign, via ip address, each host to a specific numa node, in addition to that, I'd disable hyper threading. Combined, the issue would likely decrease, however, it would do so at the cost of fewer schedulers.

On a seperate thought, I'd look at forced parameterization - the fact that your load is driving your CPU so high is very interesting and it may be worth looking into that.

Lastly, on multi-numa node systems, I typically have the output of the following queries dumping to a table every N seconds. Makes for some interesting analysis when workload changes or trace flags are implemented:

SELECT getdate() as poll_time, node_id, node_state_desc, memory_node_id, online_scheduler_count, active_worker_count, avg_load_balance, idle_scheduler_count
FROM sys.dm_os_nodes WITH (NOLOCK) 
WHERE node_state_desc <> N'ONLINE DAC'

and

SELECT top 10 getdate() as sample_poll, wait_type, count (*)
FROM sys.dm_os_waiting_tasks
WHERE [wait_type] NOT IN
('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' ) 
GROUP BY wait_type
ORDER BY COUNT (*) DESC

Sql-server – How to a connection for one DB block a connection to another DB in SQL Server

So there's a couple of ways that this can happen.

The queries in the blocking database reference objects within the blocked database.
The queries are waiting for tempdb to allocate pages and one query is blocking another query.
There is memory pressure and queries are waiting for memory to be allocated so they have to wait their turn in the memory allocator.

If you download and put sp_whoisactive on the server it'll give you a lot more information about why things are blocking if this happens again.

Given your note about the memory pressure you probably were suffering from either #2 or #3 on my list above. Odds are it was memory not being able to be allocated fast enough, but it's possible that because you didn't have enough memory stuff was spilling to tempdb faster than tempdb to do "stuff".

Best Answer

Related Solutions

Sql-server – SQL Server – Anyone use SUMA, trace flag 8048, or trace flag 8015

Sql-server – How to a connection for one DB block a connection to another DB in SQL Server

Related Question