If you want the size of the redo log members (files on disk) size, then use the query below:
-- Show Redo Logs info
set linesize 300
column REDOLOG_FILE_NAME format a50
SELECT
a.GROUP#,
a.THREAD#,
a.SEQUENCE#,
a.ARCHIVED,
a.STATUS,
b.MEMBER AS REDOLOG_FILE_NAME,
(a.BYTES/1024/1024) AS SIZE_MB
FROM v$log a
JOIN v$logfile b ON a.Group#=b.Group#
ORDER BY a.GROUP# ASC;
The query output should look like:
GROUP #THREAD# SEQUENCE# ARC STATUS REDOLOG_FILE_NAME SIZE_MB
---------- ---------- ---------- --- ---------------- ---------------------
1 1 4 YES INACTIVE /ORACLE/oradata/orcl1/redo01.log 50
2 1 5 YES INACTIVE /ORACLE/oradata/orcl1/redo02.log 50
3 1 6 NO CURRENT /ORACLE/oradata/orcl1/redo03.log 50
This is an awesome post.
To answer your final question, I'd speculate that your answer is "yes".
That said, I probably would have pursued soft numa before resorting to the trace flags. I think you are right about the numa node allocation and that's could be at the root of your problem. Via soft numa, you could scale out the requests, depending on your count of numa nodes (4?) - to 4, if that's the correct number, and then assign, via ip address, each host to a specific numa node, in addition to that, I'd disable hyper threading. Combined, the issue would likely decrease, however, it would do so at the cost of fewer schedulers.
On a seperate thought, I'd look at forced parameterization - the fact that your load is driving your CPU so high is very interesting and it may be worth looking into that.
Lastly, on multi-numa node systems, I typically have the output of the following queries dumping to a table every N seconds. Makes for some interesting analysis when workload changes or trace flags are implemented:
SELECT getdate() as poll_time, node_id, node_state_desc, memory_node_id, online_scheduler_count, active_worker_count, avg_load_balance, idle_scheduler_count
FROM sys.dm_os_nodes WITH (NOLOCK)
WHERE node_state_desc <> N'ONLINE DAC'
and
SELECT top 10 getdate() as sample_poll, wait_type, count (*)
FROM sys.dm_os_waiting_tasks
WHERE [wait_type] NOT IN
('CLR_SEMAPHORE','LAZYWRITER_SLEEP','RESOURCE_QUEUE','SLEEP_TASK','SLEEP_SYSTEMTASK',
'SQLTRACE_BUFFER_FLUSH','WAITFOR', 'BROKER_TASK_STOP',
'BROKER_RECEIVE_WAITFOR', 'OLEDB','CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT' )
GROUP BY wait_type
ORDER BY COUNT (*) DESC
Best Answer
"Free buffer wait" event usually means the buffer cache is not big enough. Increasing the size of the SGA will alleviate this wait event by allowing more changes to happen before all buffers are dirty.
"Redo log check point complete" wait event happens when a block that has been changed in the buffer cache has not yet been written to the disk. The online redo log that contains this change cannot be archived until this has happened. Adding extra log files or increasing the size of the online redo logs will alleviate this wait event in the same way as increasing the size of the SGA - more changes can be made before this error is encountered.
However, both of these events have the same root cause - too many dirty blocks in the buffer cache. This is caused by poor DBWR performance as you suggest above. Try adding extra DBWR processes. If this doesn't help then you probably have an I/O problem.