Sql-server – a SET TRANSACTION command show a long Duration when the CPU is low

performance-tuningsql server

I got enlisted to troubleshoot a performance issue on a SQL Server 2016 Enterprise Edition at a customer site and have been handed a SQL Profiler trace captured during a period of "poor" performance. One of my observations is that seemingly innocuous statements like SET TRANSACTION ISOLATION register a duration of the order of seconds.

enter image description here
In the image above, the StmtCompleted event reports a duration as well as a CPU of 0milliseconds but the BatchCompleted event reports a duration of 1907ms (I have translated the profiler duration to milliseconds by dividing it by 1000). Why should there be such a drastic gap in the duration reported by BatchCompleted event and the StmntCompleted event ? Also the BatchCompleted event reports a CPU of 15ms but not the StmtCompleted. Why should just sending an ACK back to the client need so much CPU or duration? There are other statements too which report a high duration on the BatchCompleted with almost 0ms reported against the StmtCompleted event. There are sp_reset_connection events which show a CPU=0ms but high durations of the order of seconds. Ditto with BEGIN TRANS, COMMIT TRANS, SCOPE_IDENTITY etc. And to make complicate matters, there are many instances of the same commands which have executed within acceptable duration of a few milliseconds during the same period. I am at a loss to come up with an explanation. Wondering if the stack exchange community can come up with a plausible explanation for these. The customer swears that the CPU usage & Network Usage as reported at the Operating System level were normal but admitted that the disk usage seemed higher than usual. But most of the commands mentioned above should not be affected by disk activity.

The site unfortunately does not do continuous monitoring. They run a profiler trace whenever they feel that the SQL Server is showing "slowness". Wait stats are not captured either and they periodically reboot the machine.

I am going to suggest that the customer snapshot wait stats in addition to a profiler trace. I am hoping that a very high "ASYNC NETWORK IO" wait shows up :). Should I request for any other type of monitoring ? Any help appreciated.

Best Answer

For a statement like that one (and the others you mentioned), network I/O is not included in the statement_completed event. Network I/O is included in the batch_completed event. Your suggestion to look at ASYNC_NETWORK_IO waits seems like the best next step to me.

The statement_completed can include network I/O time if the statement being executed uses a "streaming"-style execution plan (a query that returns lots of rows via a scan of a single table or simple nested loops join). This is discussed a bit at the end of my answer here: https://dba.stackexchange.com/a/244783/6141

Note that it might not actually be a problem with slow network traffic per se. It could be that the application server is maxed out on CPU, or the application is (for one reason or another) unable to continue to receive the results from SQL Server for a second or two. You might want to see if they can capture performance metrics on the app server during one of these problem times as well.

Also, consider suggesting that they stop using the READ UNCOMMITTED isolation level ?