Sql-server – SQL Server Index Scan Actual Executions

execution-planindexoptimizationsql server

I have a fairly complicated query and running it on two databases with same schemas, but with different data produces different result. On database D1 the elapsed time is around 1 second, and the number of returned rows is 4936. On database D2 the elapsed time is around 10 seconds, adn the number of returned rows is 135. The returned data is correct and expected in both cases, but the execution time is what confuses me.

Both databases have the same indexes. All statistics have been updated and indexes reorganized or rebuilt where neccessary. However, the actual execution plans on both databases differ slightly in their layout and costs.

On database D2 29% of time is wasted on Index Scan of a non-clustered index that executes 409 times (expected executes = 1) and returns 2195 rows, even though it expected only 146. With a Clustered Index Seek (only 2% cost) on another table that executes 2195 times (estimated executions = 146) and that returns 2195 rows (estimated rows = 146, both based on upper Index Scan) it enters into a Nested Loops Inner Join (9% cost).

The same Index Scan on D1 takes only 4.1% of time and executes only once and returns 4451 rows, even though it expected 4491. After it goes Compute Scalar, and then the arrow joins with output from Compute Scalar from Clustered Index Scan (instead of Clustered Index Seek in D2) into Hash Match Inner Join.

Can anyone tell me why this might be happening?

Also, in the plan on D2 I have another Index Scan of the same problematic index that also has 29% cost, but returns 0 rows, even though it expects 27 rows. In D1 this returns 7 rows, expects 8, and has 4% cost.

I'm completely lost. Any help will be appreciated.

Best Answer

Schema and indexes are only one aspect of query plan and performance. Your statement "but with different data" is likely the source of the difference. The number of rows and the distribution of data is essential to the query optimizer. If you have significantly more rows in D2, or if the data is of entirely different characteristics (wider or narrower range of values), then you should expect to see different performance and execution plans.

For each set of statistics, SQL Server keeps a maximum of 200 samples. As the rows in the tables grow and the more irregular the distribution of values the more likely it is that SQL Server will not have enough information to generate optimal execution plans. That's where the use of filtered indexes and statistics comes into play.

If this is a parameterized query you may also be running into a parameter sniffing problem. Note that if you're using local variables the calculation changes also.

Related Solutions

Sql-server – Why is Clustered Index Scan Number of Executions so high

The JOIN after the scan gives a clue: with less rows on one side of the last join (reading right to left of course) the optimiser chooses a "nested loop" not a "hash join".

However, before looking at this I'd aim to eliminate the Key Lookup and the DISTINCT.

Key lookup: your index on FIR_Incident should be covering, probably (FI_IncidentDate, incidentid) or the other way around. Or have both and see which is used more often (they both may be)
The DISTINCT is a consequence of the LEFT JOIN ... IS NOT NULL. The optimiser has already removed it (the plans have "left semi joins" on the final JOIN) but I'd use EXISTS for clarity

Something like:

select 
    F.IncidentID
from 
    FIR_Incident F
where 
    exists (SELECT * FROM postnfirssummary P
           WHERE P.incident_id = F.incidentid)
    AND
    F.FI_IncidentDate between '2011-07-01 00:00:00.000' and '2011-07-02 00:00:00.000'

You can also use plan guides and JOIN hints to make SQL Server use a hash join, but try to make it work normally first: a guide or a hint probably won't stand the test of time because they are only useful for the data and queries you run now, not in the future

Sql-server – Clustered Index Update on Nonclustered Columns

When you have a clustered index on a table, the clustered index IS the table!

Mentally you can substitute "table" for "clustered index" in this instance and it will make sense.

The data for every field in every row is in your clustered index. The clustered index just sets the order of the physical pages in the database to be organized by your clustering key(s).

You can always fall back on the phone book analogy for these things, too: in your classic phone book, the data is clustered on Last Name, First Name. Each entry still has PhoneNum, Address at the leaf level but you don't order by that. The pages are in physical order by the the keys.

I can't advise on optimizing the query unless you show us the table and query you are running, but basically this cost will be paid one way or another. If you don't update the clustered index it will be a table update and a table scan.

Best Answer

Related Solutions

Sql-server – Why is Clustered Index Scan Number of Executions so high

Sql-server – Clustered Index Update on Nonclustered Columns

Related Question