Sql-server – Is a millions-row query fetched in the SQL Studio client slower than other applications

optimizationperformancequery-performancesql server

I'm not sure if want I'm thinking makes sense. I have a SQL Server database table with 30 million rows.

If I do a select A, B, C, D from huge_table — (about 30 million— total 4 GB footprint) — does it take additional time for the SQL Studio Client to actual display all 30 million of these than it would an application to fetch (and process) the exact same query? (with no display requirements).

I mean — just to fetch two columns — a datetime and an int — from this 30 million record table that has 8 columns total — well it takes 6 minutes to pull up all the records in SQL Studio. That's with no additional joins, calculations, or even every field.

Granted, it probably depends on my local client hardware (decent enough) — and also the database is in Europe, and I'm in the United States — though in practice I've found that maybe makes a query 1.5x slower at most.

I suppose I'm used to dealing with smaller datasets (like 2 million).

Is this simply an inherent lag in SQL studio, or does it suggest a reasonable fetch time for a similar hardware spec application?

I'm wondering how to 'speed up' this query, which by design, is a full scan. I can chunk it into smaller pieces, but the total time would be the same.

Should I aggregate or pre-calculate the 30 million? To be honest, that will not save time since with all the necessary dimensions, the 30 million would condense to maybe 25 million plus aggregate processing time – it would take longer.

Indices wouldn't help with full table fetches. Right? You're saying return these fields from the full table.

What's a reasonable time to fetch 30 million records? Select A,B,C from big_table?

Best Answer

does it take additional time for the SQL Studio Client to actual display all 30 million of these than it would an application to fetch (and process) the exact same query? (with no display requirements)

In short: It depends on what the application is doing with the data (aside from not displaying it), but yes SSMS will take some time to prepare and display the results. If you set SSMS to display client stats (one of the options in the default toolbars, or from the Query menu) you'll see "client processing time" as a separately counted value.

In most circumstances the fetch will be no slower from the database's point of view, because SQL Server's engine is running the same statements and sending the same resulting data over the same network, but there will be a delay on the client at the end as the results are parsed and prepared for display.

There are a few caveats: one being that if the application is not pulling the whole results down at once but instead opening a cursor and pulling rows down individually (or a block at a time) then it may impose more effective load on the server as it may end up holding locks for much longer. In this case the application will get the results slower because of the extra latency from trips to the server to request new rows. It may appear faster to the user though, because the application can start displaying information from the first rows as soon as they are available instead of waiting for the whole resultset to transfer. Getting the data in smaller blocks will increase the amount of bandwidth used if you are using a compressed VPN or tunnel to connect to the server, because the smaller segments may contain less redundancy for the compression algorithm to take advantage off.

Should I aggregate or pre-calculate the 30 million?

We know nothing of your data, or what you are trying to do with it. What calculations? What aggregations? Without much more information we can't help there beyond wild guessing. The best thing would probably be to try it and see.

Indices wouldn't help with full table fetches.

They might if you are requesting the data be presented in a specific order. Or if you are only requesting some of the fields ("some fields from all rows" could become an index scan instead of a table scan and require less data pages to be read).

What's a reasonable time to fetch 30 million records? Select A,B,C from big_table?

There are too many variables for use to even guess an answer to that.

the database is in Europe, and I'm in the United States

For remote databases using some form of link compression will help with large results, though be aware that this will increase latency for small requests.