SQL Server Performance Monitoring – Understanding Page Lookups Percentage

performancesql server

I'm currently doing some work on monitoring SQL Server performance. I've found the following script[1] which calculates 'Page Lookups Percentage' – the suggestion being that a "good" value is something less than 100.

SELECT 1.0*cntr_value /
(SELECT 1.0*cntr_value
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Batch Requests/sec') 
AS [PageLookupPct]
FROM sys.dm_os_performance_counters
WHERE counter_name = 'Page lookups/sec'

Now I've run this script on about a dozen different servers – mostly virtual but a couple physical, mostly 2008R2 Enterprise but a couple of 2005, some live and some development, and even a physical 2008R2 Express edition that has only ever been a witness server and is currently dormant (no mirrors set up with active failover). On only one of these servers has the value been below 100. In most cases it has been in the 1000's or at least in the mid 100's. On the dormant witness server the value was a ridiculous 20,000 !

Does this indicate that we have widespread problems in this area, or is there some innocent explanation for why this figure should be so high on so many different servers? We are experiencing slow performance on some of these servers, so it would be useful for me to get a better understanding of what this figure is telling me.

[1] – http://thomaslarock.com/2012/05/are-you-using-the-right-sql-server-performance-metrics/

Best Answer

The metric is trying to measure the number of logical IO operations performed by each request, in average. If your server is, lets say, serving a busy web site, then is probably a good metric (perhaps 100 is even too high for such!).

The witness will only have internal activity. In other words it will touch data files internally, due to various background processes, despite never seeing any actual batch. As such all internal logical IO will be reported as caused by the few batches this server has seen.

I would say that the metric Thomas is recommending has some value but only in a a workload similar to what Thomas had in mind. A server that has only background activity, like your witness, is not going to report anything meaningful with this ratio. A server that runs intensive DW reports may see only 1 batch request per hour and run millions of page lookups for that one request. An absolute number like 'must be below 100' is only valuable accompanied by smallprint disclaimers like must be bellow 100 if your workload has frequent enough batch request to establish significant statistical trends and each batch requests is expected to perform under 100 page lookups in average. You get the idea...

Related Solutions

Sql-server – SSAS queries time out when Windows File System Cache drops

I believe you seeing the symptoms of an issue with Windows 2003 requiring contiguous memory and this causes the running processes including Analysis Service to trim their Working Set memory. With large amounts of memory allocated this trim process can take a significant amount of time to complete, while this is running new allocations will be blocked, causing the process to stall. This issue is fixed in Windows 2008 onwards this support article is for SQL Server but has all the links to the relevant pages http://support.microsoft.com/kb/918483 in particular this quote

In Windows Server 2008, the allocation of physically contiguous memory is greatly enhanced. Requests to allocate contiguous memory are much more likely to succeed because the memory manager now dynamically replaces pages, typically without trimming the working set or performing I/O operations. In addition, many more types of pages—such as kernel stacks and file system metadata pages, among others—are now candidates for replacement. Consequently, more contiguous memory is generally available at any given time. In addition, the cost to obtain such allocations is greatly reduced.

I believe that with this issue you will either have to upgrade to Windows 2008 onwards or test granting Lock Pages In Memory to the Analysis Services account. I have also had issues with memory allocation on Windows 2003 which requires a reboot to clear.

This SQLCAT blog compares the changes between Windows Server 2003 & 2008.

This blog mentions some of the issues of LPIM with Analysis Services (near the end)

Also this knowledge base article on working set trimming causing performance problems and this article shows how to Identifying Drivers That Allocate Contiguous Memory

Sql-server – SQL Server – Anyone use SUMA, trace flag 8048, or trace flag 8015

This is an awesome post.

To answer your final question, I'd speculate that your answer is "yes".

That said, I probably would have pursued soft numa before resorting to the trace flags. I think you are right about the numa node allocation and that's could be at the root of your problem. Via soft numa, you could scale out the requests, depending on your count of numa nodes (4?) - to 4, if that's the correct number, and then assign, via ip address, each host to a specific numa node, in addition to that, I'd disable hyper threading. Combined, the issue would likely decrease, however, it would do so at the cost of fewer schedulers.

On a seperate thought, I'd look at forced parameterization - the fact that your load is driving your CPU so high is very interesting and it may be worth looking into that.

Lastly, on multi-numa node systems, I typically have the output of the following queries dumping to a table every N seconds. Makes for some interesting analysis when workload changes or trace flags are implemented:

SELECT getdate() as poll_time, node_id, node_state_desc, memory_node_id, online_scheduler_count, active_worker_count, avg_load_balance, idle_scheduler_count
FROM sys.dm_os_nodes WITH (NOLOCK) 
WHERE node_state_desc <> N'ONLINE DAC'

and

SELECT top 10 getdate() as sample_poll, wait_type, count (*)
FROM sys.dm_os_waiting_tasks
WHERE [wait_type] NOT IN
('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' ) 
GROUP BY wait_type
ORDER BY COUNT (*) DESC

Best Answer

Related Solutions

Sql-server – SSAS queries time out when Windows File System Cache drops

Sql-server – SQL Server – Anyone use SUMA, trace flag 8048, or trace flag 8015

Related Question