Sql-server – Why is Clustered Index Scan Number of Executions so high

optimizationsql server

I have two similar queries that generate the same query plan, except that one query plan executes a Clustered Index Scan 1316 times, while the other executes it 1 time.

The only difference between the two queries is different date criteria. The long running query actually narrower date criteria, and pulls back less data.

I have identified some indexes that will help with both queries, but I just want to understand why the Clustered Index Scan operator is executing 1316 times on a query that is virtually the same as the one that where it executes 1 time.

I checked the statistics on the PK that is being scanned, and they are relatively up to date.

Original query:

select distinct FIR_Incident.IncidentID
from FIR_Incident
left join (
    select incident_id as exported_incident_id
    from postnfirssummary
) exported_incidents on exported_incidents.exported_incident_id = fir_incident.incidentid
where FI_IncidentDate between '2011-06-01 00:00:00.000' and '2011-07-01 00:00:00.000'
    and exported_incidents.exported_incident_id is not null

Generates this plan:
enter image description here

After narrowing the date range criteria:

select distinct FIR_Incident.IncidentID
from FIR_Incident
left join (
    select incident_id as exported_incident_id
    from postnfirssummary
) exported_incidents on exported_incidents.exported_incident_id = fir_incident.incidentid
where FI_IncidentDate between '2011-07-01 00:00:00.000' and '2011-07-02 00:00:00.000'
    and exported_incidents.exported_incident_id is not null

Generates this plan:
enter image description here

Best Answer

The JOIN after the scan gives a clue: with less rows on one side of the last join (reading right to left of course) the optimiser chooses a "nested loop" not a "hash join".

However, before looking at this I'd aim to eliminate the Key Lookup and the DISTINCT.

  • Key lookup: your index on FIR_Incident should be covering, probably (FI_IncidentDate, incidentid) or the other way around. Or have both and see which is used more often (they both may be)

  • The DISTINCT is a consequence of the LEFT JOIN ... IS NOT NULL. The optimiser has already removed it (the plans have "left semi joins" on the final JOIN) but I'd use EXISTS for clarity

Something like:

select 
    F.IncidentID
from 
    FIR_Incident F
where 
    exists (SELECT * FROM postnfirssummary P
           WHERE P.incident_id = F.incidentid)
    AND
    F.FI_IncidentDate between '2011-07-01 00:00:00.000' and '2011-07-02 00:00:00.000'

You can also use plan guides and JOIN hints to make SQL Server use a hash join, but try to make it work normally first: a guide or a hint probably won't stand the test of time because they are only useful for the data and queries you run now, not in the future