Sql-server – How to it take longer to replay a captured trace than to capture it

performancesql servertrace

I am attempting to use trace captures to determine how much of a performance issue was database related and how much was code related. However, the trace is taking longer to replay than it took to capture in the first place.

What I was after was the total time it would take to run our SQL statements alone, without the overhead of our actual code. If the replay took half the total time of the capture, I was going to assume that the fault was both code and database, if the replay took almost the same time as the capture, I was going to assume it was database, if the replay took almost no time compared to our capture, I was going to assume it was our code. I was assuming the replay was a very efficient way of executing all the statements.

Instead when I replay my trace it takes at least twice as long to run as the original capture took. This is true whether I allow multiple threads or require it to be sequential. I am new to this, so I could be doing something dumb (possibly the whole idea here being dumb), but it seems to me that replay ought to be faster, not slower.

My capture trace is the replay template + SQL:statement completed (so that it doubles as a tuning capture).

Best Answer

SQL Server can handle multiple concurrent requests simultaneously from multiple machines and threads. Unfortunately, replay can only spit them out one at a time, serially, from a single machine - introducing multiple layers of bottlenecks. There is probably also some overhead in replay to read and prepare the statements to send, that doesn't exist in your application.

If you want the replay to be more realistic, you should look into SQL Server 2012's Distributed Replay, which is able to distribute the replay across multiple machines. That said, you're not likely to ever match the original load time perfectly, and there is likely still some inherent overhead in the preparation of each statement during replay, even if you are able to eliminate some of the bottlenecks on concurrency.