Increase Oracle buffer cache gets slower

oracleoracle-11g-r2performance

So we're contemplating increasing our Oracle 11gR2 buffer cache. They said they tried it in the past and things appeared to get worse – so they undid the change. They, however, have no idea why things could have gotten worse.

Its possible they were preemptive and just didn't give it time considering it was full server reboot, but what are the situations where increasing the cache would yield worse results? I've seen this page (situation B) but I can't find anybody else claiming that. We're definitely doing more scattered reads than 'normal' probably given the kinds of queries we have to run.

Best Answer

In any system, if you make the part of the system that is not the bottleneck more efficient, there is the potential to make the system as a whole less efficient. If, for example, your system's bottleneck is the CPU, making I/O more efficient by increasing the size of the buffer cache may end up causing many more threads to be waiting on the CPU which can make the system as a whole much slower as the operating system is now spending more time swapping between the competing thread, the CPU caches get overwhelmed, and you get the classic hockey stick graph of response time where once you get to the "knee", adding a bit more load adds a lot to the response time of the system.

If you have a system where I/O requests are highly uncorrelated, increasing the size of the cache could mean that Oracle will spend more time looking through the cache for a particular block before being forced to do a physical read after all. That's probably relatively unlikely in an OLTP application but it's possible in a data warehousing type system where various reports are asking for blocks that no other report will want to reuse.

Of course, whether this is actually the problem that they encountered in the past is still an open question. It is entirely possible that it was simply a matter of having a cold cache that needed to be warmed up or a perception problem where someone drew an incorrect conclusion from an earlier test or a poorly constructed test where, for example, someone inadvertently increased the size of the buffer cache at the expense of the PGA rather than at the expense of the operating system I/O cache. I tend to share your distrust of human recollections of the past that aren't corroborated by some nice AWR reports so I'm not claiming that either of the above explanations are likely merely that they are potential explanations if the human recollections are correct.

Related Solutions

Sudden increase in log_file_sync waits

Log file sync occurs when a commit is made and the redo buffer needs to be flushed to disk. The session has to wait for that to happen.

An increase in the number of log file syncs generally means that one of your developers has gone commit-happy, and is committing far too frequently -- every row, for example.

Here you probably have a process that performs around 1.5 million DML statements with a commit being issued after each one, so look out for some process that loads a few million rows of data.

Oracle query is slower the second time

Cases like this can usually be solved for certain by enabling trace events 10046 (if you have authority to do so), and run tkprof on the resulting trace files:

alter session set events '10046 trace name context forever, level 8';
select /* try one */ -- add your SQL here and invoke it;
select /* try two */ -- add your same SQL here and invoke it;
show parameter user_dump_destination

Invoke it a few times, and then turn tracing off, log off to ensure the trace files get closed:

alter session set events '10046 trace name context off';
exit;

Then find your trace file, and run tkprof on it, assuming your trace file name is ${ORACLE_SID}_ora_39110.trc:

cd $ORACLE_BASE/diag/rdbms/$ORACLE_SID/$ORACLE_SID/trace
tkprof ${ORACLE_SID}_ora_39110.trc ${ORACLE_SID}_ora_39110.trc.txt
vi ${ORACLE_SID}_ora_39110.trc.txt

The output of tkprof will show you execution statistics (logical reads, physical reads, etc). And you can look at the actual trace file if the tkprof output doesn't help.

If that doesn't help, you can enable 10053 event, and enjoy wading through thousands of lines of output; but sometimes you need to do that.

But first things (well easier things) first; do as others suggested; see if table statistics are up to date, optimizer settings, etc. If you are using the exact same criteria each execution, could be the optimizer is "on the border" on plans. I would ensure you have collected system statistics; see the documentation on dbms_stats.gather_system_stats pl/sql package.

Best Answer

Related Solutions

Sudden increase in log_file_sync waits

Oracle query is slower the second time

Related Question