Sql-server – Measuring latency of a long-distance SQL connection

performancequery-performancesql serversql-server-2012

We have written a simple "client" app for a customer that connects to their SQLServer database, runs about 360 queries against a "property", and returns that data as a .Net object we call a "cProp".

The queries vary in complexity, but the very longest take perhaps 400ms. When we recently ran it for 60 properties, the entire run took about 3 minutes. The customer has decided to move their database to a remote server with 65ms between us. As a result, the same query required 51 minutes.

Now I'm 99% sure the problem is row retrieval, but I need to prove that. The DBA ran a trace on the process, but having never used it before, I was dismayed to find out how little information it contained. What I did see is that it took just under 4 minutes of server time. This reinforces my belief that the problem is in the row retrieval.

But how do I measure that? The trace returns nothing about the number of rows or total data size – am I doing it wrong? I also know that batching happens outside the server, so how do we even find out what's going on there? Generally speaking, how does one debug issues that are taking place "outside the server" so to speak.

Best Answer

Just to clarify some math here:

  • 360 queries per "property"
  • 60 properties * 360 = 21,600 queries
  • 3 minutes in duration is about 120 queries per second

So in your best case, you're averaging 120 queries per second throughput.

After moving the SQL Server to somewhere that has a ping latency of 65ms, you're averaging 7 queries per second.

Your best bet is to

  • measure wait statistics while this query is running
    • If you don't have a monitoring tool, a simple way to do this is running sp_BlitzFirst, an open source tool developed by Brent Ozar Unlimited. It'll take a snapshot of your wait stats (among other things), wait 5 seconds, and then take another snapshot, and show you a summary of the biggest issues
    • You can also see the wait stats since the server started using a query like the one in Paul Randal's blog post: Wait statistics, or please tell me where it hurts
  • Once you've got those results, examine the highest wait stats, and use the SQL Skills wait stats library to dig into what they mean

Based on your description, it's likely that you'll see ASYNC_NETWORK_IO waits. This can indicate slow network transfer. But usually it means that the client application is doing row-by-row processing, and SQL Server is waiting to send rows (when it really wants to fire them over the network as fast as possible).

On a related note - in general, it would be far more effective to try to pull larger "groups" of data - rather than running 21,000 queries, and thus making 21,000 round trips over the (potentially slow) network. This is called a "chatty" interface or API, and is generally considered to be an anti-pattern.

Consider that 65ms * 21,600 queries is 1,404,000 ms, AKA over 23 minutes of overhead. That's a big chunk of overheard that could be removed simply by limiting the number of roundtrips.