We have written a simple "client" app for a customer that connects to their SQLServer database, runs about 360 queries against a "property", and returns that data as a .Net object we call a "cProp".
The queries vary in complexity, but the very longest take perhaps 400ms. When we recently ran it for 60 properties, the entire run took about 3 minutes. The customer has decided to move their database to a remote server with 65ms between us. As a result, the same query required 51 minutes.
Now I'm 99% sure the problem is row retrieval, but I need to prove that. The DBA ran a trace on the process, but having never used it before, I was dismayed to find out how little information it contained. What I did see is that it took just under 4 minutes of server time. This reinforces my belief that the problem is in the row retrieval.
But how do I measure that? The trace returns nothing about the number of rows or total data size – am I doing it wrong? I also know that batching happens outside the server, so how do we even find out what's going on there? Generally speaking, how does one debug issues that are taking place "outside the server" so to speak.
Best Answer
Just to clarify some math here:
So in your best case, you're averaging 120 queries per second throughput.
After moving the SQL Server to somewhere that has a ping latency of 65ms, you're averaging 7 queries per second.
Your best bet is to
Based on your description, it's likely that you'll see ASYNC_NETWORK_IO waits. This can indicate slow network transfer. But usually it means that the client application is doing row-by-row processing, and SQL Server is waiting to send rows (when it really wants to fire them over the network as fast as possible).
On a related note - in general, it would be far more effective to try to pull larger "groups" of data - rather than running 21,000 queries, and thus making 21,000 round trips over the (potentially slow) network. This is called a "chatty" interface or API, and is generally considered to be an anti-pattern.
Consider that 65ms * 21,600 queries is 1,404,000 ms, AKA over 23 minutes of overhead. That's a big chunk of overheard that could be removed simply by limiting the number of roundtrips.