Sql-server – Buffer usage for index with no reads

buffer-poolcachememorysql serversql-server-2008

Using SQL Server 2008 R2, the main transactional table in a vendor database is massive compared to all others and has about 14 indexes. Some of these indexes don't get used in our environment, but we can't remove them. That's not a problem, it's just something we have to live with.

The question I have is about why some of these low-or-no-read indexes seem to be using memory – more than other indexes on the same large table that get used often. I would have thought that most of the buffer cache would go towards objects that are read from frequently. The only thing happening on these indexes is write overhead.

For example, one of these low-read indexes is being allocated about 2 GB of memory (58% of the index's total size) and another has 1.7 GB of memory (27% of its size). Meanwhile, the monster-sized and well-used clustered index itself only has 4 gigs (2% of its size). A different NC index with lots of reads only has 100 MB in the buffer cache (5% of its size).

Looking at the physical stats, I can see the fragmentation is pretty bad. That's understandable from all the writes on this table and the non-sequential inserts. I'm not sure if it could be related to memory usage, though.

Looking at the operational stats for these indexes is also interesting.

  • The leaf_ghost_count reminds me that there are just about as many deletes taking place on this table as there are inserts (from a regular cleaning process).
  • One of these low-read indexes has some of the highest page_lock_wait values in the database. Perhaps that's only because of the writes?
  • 2 others have some of the highest page_io_latch_wait values. I understand that latch usage would be related to memory usage, so that makes sense.

I realize this is an abstract question and that I'm not providing many actual stats. I'm just curious about how SQL Server comes to these buffer cache usage decisions and wonder if anyone out there understands it.

Best Answer

Sql Server's Buffer Pool is a wonderful thing. It's smart enough to handle all sorts of situations in a fairly intelligent way. Here are a couple examples showing how at first glance the buffer pool behavior seems strange, but is actually fairly clever.

Consider a 400 GB clustered index on a server with 64 GB of Memory available for the buffer pool. If a query scans the entire 400 GB index, how much buffer pool should the index use after the query is complete?

Probably not very much. If Sql Server aggressively cached all the data it accessed maybe the last 50GB that it read during the scan is in memory. However, the next time you run the same query, that 50GB is useless because Sql will have to read the first 350GB from disk before it gets to the last 50GB. This is a common enough scenario that Sql handles it intelligently. Instead it is more likely that all of the intermediate level index pages will be cached along with a few of the leaf pages.

I've got a server with 512GB of memory, but it never seems to use all of it, or it uses it..strangely?

Sql Server is NUMA aware. On NUMA (Non-Uniform Memory Access) systems, some memory is closer/faster to a CPU than other memory. For example, a 4 socket system might have 4 memory controllers (1 per socket). It is faster for a CPU to access memory on the controller attached to it vs accessing memory through some other cpu's controller. (I've seen 1.4-1.7 times slower access).

Sql Server knows this, and will prefer to allocate local node memory before remote node memory. Since requests can run on any cpu (they don't typically jump between them) and hence allocate memory on any node, the first request to access an index will cache it's pages in local memory. In most cases, you'll end up with a fairly equal distribution of cached pages, however, each node is managed individually by sql server, so effectively, your 512 GB is really four 128GB buffer pools. You could have the situation where node 1 is using all 128GB, node 2 is using 80GB, and pages are getting discarded from node 1 to make room while node 2 has lots of free space.

Why would my heavily read from indexes be cached less than indexes only receiving writes?

One possible reason is checkpoints. Depending on your recovery interval, it may be several minutes after a data change occurs before a checkpoint writes the page to the data file (it's already in the log file on disk). During this time, the page can't be discarded to make room in the buffer pool.

Besides setting the size, are there other ways to change how the buffer pool behaves?

There isn't a whole lot of control over the buffer pool and the decisions it makes. That's fine because you typically don't need to tweak it. Watching perfmon counters for "SQLServer:Buffer Manager" and "SQLServer:Buffer Node" can provide insight into what the buffer pool is doing, whether it is sufficient in size, whether your IO subsystem can keep up with the checkpoints.


Regarding a couple of items you mentioned:

  • Fragmentation on disk doesn't affect memory usage directly. However, if an index has 50% free space on the pages due to deletes, it will take up twice a much room on disk and in memory, but that's related to free space, not fragmentation.
  • PAGE_IO_LATCH Waits measure the time it takes for the IO subsystem to load pages into memory. They are an indicator of disk performance and not memory.
  • You are correct, high page lock waits are due to the writes. The writes make pages dirty, and dirty pages have to stick around in the buffer pool until they get checkpointed to disk.