Sql-server – How to interpret the log when I run DBCC TRACEON (3502, 3504, 3605, -1)

dbccsql serversql-server-2008-r2

I have been using DBCC Traceon (3502, 3504, 3605, -1) because it was recommended in a blog for discovering performance issues related to I/O. I'm running MS SQL Server 2008 R2 SP1

Results in my SQL Log file look something like this (numbers fudged a little):

about to log checkpoint end

last target outstanding 2, avgWriteLatency 40ms

Average Throughput: 0.67 MB/sec, I/O Saturation: 79, Context Switches
201

FlushCache: cleaned up 125 Bufs with 69 writes, in 1447ms (avoided 0
new dirty bufs)

Ckpt dbid 9 phase 1 ended (8)

about to log checkpoint begin.

I don't really know how to read this, or break it down in a way that I get get anything truly meaningful out of it.

What does 'last target outstanding mean?"

Does the average write latency mean the overhead time it takes per write? or the time between writes? 40ms seems high, the physical drive is a 1TB, and it's RAID5 configured.

What is I/O saturation?

What does it have to do with the Context Switches. I'm assuming Context switches have something to do with multi-tasking. Changing between jobs/writes.

FlushCache. I realize this has to do with clearing out the cache. What are the Bufs? Are these pages of data that needed to be written? What are the dirty Bufs? Why would they be avoided?

A detailed breakdown would be appreciated.

Best Answer

The trace flags that you have turned on will tell you what a checkpoint is doing behind the scenes.

  • 3502: writes to the error log when a checkpoint starts and finishes
  • 3504: writes to the error log information about what is written to disk
  • 3605: allows trace prints to go to the error log

Refer to Paul Randall's blog post for more details on the above. Also, Fine Tuning for Optimal Performance has an excellent info - especially In Search of Spikes section.

Some really internals reading :

  1. How It Works: When is the FlushCache message added to SQL Server Error Log?
  2. How It Works: SQL Server Checkpoint (FlushCache) Outstanding I/O Target
  3. SQL Server checkpoint problems

Instead of concentrating directly on checkpoint behaviour, I would suggest you to look at DMVs and Perfmon (disk related) -

  1. sys.dm_io_virtual_file_stats
  2. Physical Disk Object: Avg. Disk Queue Length
  3. Avg. Disk Sec/Read
  4. Avg. Disk Sec/Write
  5. Physical Disk: %Disk Time
  6. Avg. Disk Reads/Sec
  7. Avg. Disk Writes/Sec

You can refer to Investigating I/O bottlenecks