Oracle: Database writers idle until ‘Checkpoint not complete’ even with multiple redo logs

oracleoracle-11g-r2performanceperformance-tuning

I work on an application with a steady data income of 400 MB/hr. For that purpose we have setup 10 redo log groups with a size 100MB each. Therefore, we roll over a log approx. every 15min.

Because of limited disk space, we have a retention job which deletes 'old' data after a certain time. When this job is run it is doing a delete cascade based on timestamps. So far so good…

When the data removal process is running the redo log interval shortens to 15s and after 9 log rollovers we see the infamous 'Checkpoint not completed' message.

We oberserved with top and iotop how the log writer (ora_lgwr_XXXX) and database writers (ora_dbwX_XXXX) behave and found that the database writers are completely idle until the point of 'Checkpoint not complete'. The logwriter writes with MB/s and the database writers idle. Regarding the documentation the database writers start on created checkpoints and these are created after every redo log rollover.

So, why are the database writers not working until the redo logs are all full?

Shouldn't I see some significant I/O for the database writers after the first checkpoint is completed with the first redo log rollover?

We see some accumalted kBs for the two datbase writers before the 'Checkpoint not completed' message with at least 9 redo log rollovers. Afterwards, the database writers store the data with a rate of MB/s to database.

Can the database writer processes made more aggressive? Idealy depending on the amount of data pending in the redo logs? I do not need to have every byte integrated into database files immeditially, but when a lot of data is pending in the database buffer cache (and the redo logs are still active) and a lot of data is coming in, there should be a mechanism to drive the database writers to a higher performance?

Metadata:

  • 8 vCPUs
  • 56 GB RAM
  • Oracle 11.2.0.3
  • log_checkpoint_interval=0
  • log_checkpoint_timeout=1800
  • db_writer_processes=1
  • 10 redo log groups, 100MB each
  • 1 1TB parition via iSCSI for all data (oradata)

Best Answer

I guess you could play with the fast_start_io_target and log_checkpoint_timeout initialization parameters to entice database writers to write out dirty blocks more aggressively. Unless your redo logs are on a different I/O path from your data files, you are unlikely to see any benefits from this though, because both log and database writers will compete for the same I/O bandwidth. A better choice, as others have mentioned, would be to increase the size of your redo logs to accommodate long-running transactions.