Sql-server – SQL Server Index Scan Actual Executions

execution-planindexoptimizationsql server

I have a fairly complicated query and running it on two databases with same schemas, but with different data produces different result. On database D1 the elapsed time is around 1 second, and the number of returned rows is 4936. On database D2 the elapsed time is around 10 seconds, adn the number of returned rows is 135. The returned data is correct and expected in both cases, but the execution time is what confuses me.

Both databases have the same indexes. All statistics have been updated and indexes reorganized or rebuilt where neccessary. However, the actual execution plans on both databases differ slightly in their layout and costs.

On database D2 29% of time is wasted on Index Scan of a non-clustered index that executes 409 times (expected executes = 1) and returns 2195 rows, even though it expected only 146. With a Clustered Index Seek (only 2% cost) on another table that executes 2195 times (estimated executions = 146) and that returns 2195 rows (estimated rows = 146, both based on upper Index Scan) it enters into a Nested Loops Inner Join (9% cost).

The same Index Scan on D1 takes only 4.1% of time and executes only once and returns 4451 rows, even though it expected 4491. After it goes Compute Scalar, and then the arrow joins with output from Compute Scalar from Clustered Index Scan (instead of Clustered Index Seek in D2) into Hash Match Inner Join.

Can anyone tell me why this might be happening?

Also, in the plan on D2 I have another Index Scan of the same problematic index that also has 29% cost, but returns 0 rows, even though it expects 27 rows. In D1 this returns 7 rows, expects 8, and has 4% cost.

I'm completely lost. Any help will be appreciated.

Best Answer

Schema and indexes are only one aspect of query plan and performance. Your statement "but with different data" is likely the source of the difference. The number of rows and the distribution of data is essential to the query optimizer. If you have significantly more rows in D2, or if the data is of entirely different characteristics (wider or narrower range of values), then you should expect to see different performance and execution plans.

For each set of statistics, SQL Server keeps a maximum of 200 samples. As the rows in the tables grow and the more irregular the distribution of values the more likely it is that SQL Server will not have enough information to generate optimal execution plans. That's where the use of filtered indexes and statistics comes into play.

If this is a parameterized query you may also be running into a parameter sniffing problem. Note that if you're using local variables the calculation changes also.