Sudden increase in log_file_sync waits

oracleoracle-11g-r2performance

I'm on Oracle 11gR2 with a 2 node RAC system. It's shared fiber storage to an EMC Clariion.

Last friday things went bad..fast. All of the sudden processes that normally ran fine for years became very, very slow. I noticed a sudden increase in log_file_sync waits and the LGWR process is listed as a blocker for several processes. Nothing changed on that Friday that we're aware of. Also, it appears to be just on one node.

Statspack reports confirm that log_file_sync wait time went from around 1ms to 47ms !
Additionally statspack shows this – meaning some of them are waiting a lot:

                           Total ----------------- % of Waits ------------------
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
log file sync               100K    .0    .3   1.7   9.0  25.4  31.0  32.4    .1

And before it was this:

                           Total ----------------- % of Waits ------------------
Event                      Waits  <1ms  <2ms  <4ms  <8ms <16ms <32ms  <=1s   >1s
-------------------------- ----- ----- ----- ----- ----- ----- ----- ----- -----
log file sync              1589K  72.3  20.4   5.4   1.2    .6    .1    .0

What can cause this? What should I be checking for?

Best Answer

Log file sync occurs when a commit is made and the redo buffer needs to be flushed to disk. The session has to wait for that to happen.

An increase in the number of log file syncs generally means that one of your developers has gone commit-happy, and is committing far too frequently -- every row, for example.

Here you probably have a process that performs around 1.5 million DML statements with a commit being issued after each one, so look out for some process that loads a few million rows of data.