SQL Server – Best Procedure to Measure Query Optimization Effects

optimizationperformanceperformance-tuningsql serversql-server-2008-r2

I could not find valuable answers or explanations to the following question:

How should a procedure to test query improvements effects look like? Using SSMS I am not absolutely sure about the steps.

Currently I do the following:

restore the production DB to an isolated test server, to cut off external influence / usage that falsifies results by additional server usage.
rebuild all indexes and statistics.

Then I repetitively execute the following steps:

SET STATISTICS IO ON – to get information about IO usage
DBCC FREEPROCCACHE; CHECKPOINT; – to free plan cache and consider e.g. index changes when repeating / testing the changes
DBCC DROPCLEANBUFFERS; – to avoid usage of cached data to make sure I have a fair comparison on each query execution
run the query

After each execution I quickly import all the statistics outputs into an excel sheet, to compare the changes to originals (see picture) as well as the runtime, checking if the number of returned rows still match the original and inspect the new execution plan to see the effects.

Unfortunately this process takes a lot of time because each execution with cleared caches takes about 15 minutes to complete.

Is this procedure fine or should I consider changes?
Is clearing out the data cache really necessary or is it still a fair comparison to execute the query with the data cache since there is no other users active that could make changes to that cache?

Best Answer

At the risk of stating the obvious, first decide what you are optimizing for. Almost always it is elapsed time, though you may be trying to reduce CPU use (to reduce license fees) or IO (to avoid a disk upgrade, say). Then find a way to measure that thing, holding other variables constant. Be sure your chosen measure actually measures what you target and not a bunch of other stuff too.

For example, running a query in SSMS and looking at the status bar execution time will show round-trip network time plus plan compilation time plus IO time plus query execution time plus client caching time plus SSMS rendering time. For large result sets and slow networks the overhead and variability may swamp any actual improvement.

Having a completely clean bufferpool and plan cache for each itteration will give great reproducability. It is unlikely to be the actual production scenario, however. Since production is what you're trying to make fast then I'd suggest you test in a production-like environment, with similar memory & disk, plans compiled and the working set in the bufferpool.

Related Solutions

Sql-server – SQL performance issues with remote query across linked server

You can turn on trace flag 7300 which might give you a more detailed error message

How many rows does a representative query return? How fast/reliable is the network connection between the two servers?

It's possible that a large dataset is taking too long to transfer (on top of the actual query time). You could raise the timeout value.

You can try to reconfigure the timeout setting as follows:

Set the remote login timeout to 300 seconds:

sp_configure 'remote login timeout', 300
go 
reconfigure with override 
go

Set the remote query timeout to 0 (infinite wait):

sp_configure 'remote query timeout', 0 
go 
reconfigure with override 
go

Update: SQL Server 2012 SP1 onwards: users with SELECT permission will be able to access DBCC SHOW_STATISTICS which will improve read only performance on linked servers. Ref: https://msdn.microsoft.com/en-us/library/ms174384(v=sql.110).aspx

Update: You are correct in saying it's not the size of data or connection speed. It rang a bell in my foggy memory and I remembered where I had seen it: Slow in the Application, Fast in SSMS?(An Issue With Linked Servers). It's not parameter sniffing, it is the statistics themselves that are missing (due to permissions), causing a bad query plan to be used:

You can see that the estimates are different. When I ran as sysadmin, the estimate was 1 row, which is a correct number, since there are no Orders in Northwind where the order ID exceeds 20000. But when I ran as a plain user, the estimate was 249 rows. We recognize this particular number as 30 % of 830 orders, or the estimate for an inequality operation when the optimizer has no information. Previously, this was due to an unknown variable value, but in this case there is no variable that can be unknown. No, it is the statistics themselves that are missing.

As long as a query accesses only tables in the local server, the optimizer can always access the statistics for all tables in the query; there are no extra permission checks. But this is different with tables on a linked server. When SQL Server accesses a linked server, there is no secret protocol that is only used for inter-server communication. No, instead SQL Server uses the standard OLE DB interface for linked servers, be other SQL Server instances, Oracle, text files or your home-brewed data source, and connects just like any other user. Exactly how statistics is retrieved depends on the data source and the OLE DB provider in question. In this case, the provider is SQL Server Native Client which retrieves the statistics in two steps. (You can see this by running Profiler against the remote server). First the provider runs the procedure sp_table_statistics2_rowset which returns information about which column statistics there are, as well as their cardinality and their density information. In the second step, the provider runs DBCC SHOW_STATISTICS, a command that returns the full distribution statistics. (We will look closer at this command later in this article.) Here is the catch: to run DBCC SHOW_STATISTICS, you must be member of the server role sysadmin or any of the database roles db_owner or db_ddladmin.

And this is why I got different results. When running as sysadmin I got the full distribution statistics which indicated that there are no rows with order ID > 20000, and the estimate was one row. (Recall that the optimizer never assumes zero rows from statistics.) But when running as the plain user, DBCC SHOW_STATISTICS failed with a permission error. This error was not propagated, but instead the optimizer accepted that there were no statistics and used default assumptions. Since it did get cardinality information, it learnt that the remote table has 830 rows, whence the estimate of 249 rows.

Whenever you encounter a performance problem where a query that includes access to a linked server is slow in the application, but it runs fast when you test it from SSMS, you should always investigate if insufficient permissions on the remote database could be the cause. (Keep in mind that the access to the linked server may not be overt in the query, but could be hidden in a view.) If you determine that permissions on the remote database is the problem, what actions could you take?

You can add the users to the role db_ddladmin, but since this gives them right to add and drop tables, this is not recommendable.

By default, when a users connect to a remote server they connect as themselves, but you can set up a login mapping with sp_addlinkedsrvlogin, so that users map to a proxy account that has membership in db_ddladmin. Note that this proxy account must be an SQL login, so this is not an option if the remote server does not have SQL authentication enabled. This solution too is somewhat dubious from a security perspective, although its better the previous suggestion.

In some cases you can rewrite the query with OPENQUERY to force evaluation on the remote server. This can be particularly useful, if the query includes several remote tables. (But it can also backfire, because the optimizer now gets even less statistics information from the remote server.)

You could of course use the full battery of hints and plan guides to get the plan you want.

Finally, you should ask yourself whether that linked-server access is needed. Maybe the databases could be on the same server? Could data be replicated? Some other solution?

Sql-server – Retrieving cached plans fails due to cached objects of database Mirror

The sys.dm_exec_query_plan has a column returned for dbid, you could just filter on that for the database that is being mirrored.

Or, you could get fancy and nest a sub-query that filters for any dbid that is in a mirrored state by going out to the sys.database_mirroring DMV.

Best Answer

Related Solutions

Sql-server – SQL performance issues with remote query across linked server

Sql-server – Retrieving cached plans fails due to cached objects of database Mirror

Related Question