Increase Oracle buffer cache gets slower

oracleoracle-11g-r2performance

So we're contemplating increasing our Oracle 11gR2 buffer cache. They said they tried it in the past and things appeared to get worse – so they undid the change. They, however, have no idea why things could have gotten worse.

Its possible they were preemptive and just didn't give it time considering it was full server reboot, but what are the situations where increasing the cache would yield worse results? I've seen this page (situation B) but I can't find anybody else claiming that. We're definitely doing more scattered reads than 'normal' probably given the kinds of queries we have to run.

Best Answer

In any system, if you make the part of the system that is not the bottleneck more efficient, there is the potential to make the system as a whole less efficient. If, for example, your system's bottleneck is the CPU, making I/O more efficient by increasing the size of the buffer cache may end up causing many more threads to be waiting on the CPU which can make the system as a whole much slower as the operating system is now spending more time swapping between the competing thread, the CPU caches get overwhelmed, and you get the classic hockey stick graph of response time where once you get to the "knee", adding a bit more load adds a lot to the response time of the system.

If you have a system where I/O requests are highly uncorrelated, increasing the size of the cache could mean that Oracle will spend more time looking through the cache for a particular block before being forced to do a physical read after all. That's probably relatively unlikely in an OLTP application but it's possible in a data warehousing type system where various reports are asking for blocks that no other report will want to reuse.

Of course, whether this is actually the problem that they encountered in the past is still an open question. It is entirely possible that it was simply a matter of having a cold cache that needed to be warmed up or a perception problem where someone drew an incorrect conclusion from an earlier test or a poorly constructed test where, for example, someone inadvertently increased the size of the buffer cache at the expense of the PGA rather than at the expense of the operating system I/O cache. I tend to share your distrust of human recollections of the past that aren't corroborated by some nice AWR reports so I'm not claiming that either of the above explanations are likely merely that they are potential explanations if the human recollections are correct.