SQL Server – Reducing I/O in Clustered Index Scans

execution-planperformancequery-performancesql server

The plan attached should not take more than a minute to run, but sometimes it takes hours. The I/O on the two outer clustered index scans balloons out of control and the query crawls. Can someone explain to me why this is happening and how to fix it?

https://www.brentozar.com/pastetheplan/?id=Hyp7qKFEx

Best Answer

Let's trace through the query plan from right to left. SQL Server estimates that only a single row will be returned from the t1 derived table. For each row returned it will do a clustered index scan of the VoterTelephones table in node ids 20 and 26. This means that if the cardinality estimate from the t1 derived table is wrong then SQL Server could end up doing many more clustered index scans than it expected when costing this plan. For example, if a thousand rows are returned from the t1 derived table then SQL Server will may do 2000 clustered index scans on the VoterTelephones table. If SQL Server had a more accurate cardinality estimate then it probably would have picked a different query plan.

There's another less technical way to get to a similar conclusion. Let's start at the left of the query plan this time. SQL Server thinks that only a single row will be inserted into the #tmpKeep1 table. Is that true? If you expect it to not be true then the estimated cost for the query plan could be quite inaccurate. The plan has an estimated cost of 50 magic optimizer units yet the query sometimes takes hours to finish. That's another reason to think that the cost is inaccurate compared to what actually happens when executing the query. If SQL Server has made bad assumptions or estimates about your data then it may have picked a sub-optimal query plan to return your data.

To improve query performance from I would start by trying to correct the cardinality estimate returned from the t1 derived table:

--get dupe cell phones
select TelAreaCode, TelNumber 
from VoterTelephones 
where TelCellFlag = 1 group by TelAreaCode, TelNumber having count(*) > 1

The most straightforward way to do this would be to put the results from that query into a temp table. SQL Server will gather stats on the temp table so you will get a more accurate cardinality estimate which is likely to change the query plan.

Depending on your version of SQL Server you may be able to improve the estimate by creating a multi-column statistics object on the VoterTelephones table, or even just by updating the stats with FULLSCAN. A filtered statistics that includes the where TelCellFlag = 1 predicate could help as well. The idea here is that if you give SQL Server more information about your data then it may be able to give you a better cardinality estimate for that derived table.

It may be true that it's too difficult to fix the cardinality estimate from that part of the query. In that case you could try creating indexes on the VoterTelephones table that eliminates the need for SQL Server to do a clustered index scan for each row in the outer table. If you create covering indexes on the table then SQL Server will do an index seek per row in the outer result set instead of a clustered index scan per row. That should be far more efficient.

For the joins to the table aliased as vt you filter on the TelCellFlag column, join on the TelAreaCode and TelNumber columns, and also need the LALVoterID and TelMatchScore columns for other parts of the query. You can get that information by carefully parsing the query text or by looking at the nested loop join operator with a node ID of 7 and the clustered index scan with a node ID of 20. For example, here is the predicate for the scan:

[S_MS].[dbo].[VoterTelephones].[TelCellFlag] as [vt].[TelCellFlag]=(1)

Here is the join predicate for the nested loop operator:

[S_MS].[dbo].[VoterTelephones].[TelAreaCode] as [vt].[TelAreaCode]
=[S_MS].[dbo].[VoterTelephones].[TelAreaCode] 
AND [S_MS].[dbo].[VoterTelephones].[TelNumber] as [vt].[TelNumber]
=[S_MS].[dbo].[VoterTelephones].[TelNumber]

Here is the output column list for the clustered index scan:

[S_MS].[dbo].[VoterTelephones].LALVoterID,
[S_MS].[dbo].[VoterTelephones].TelNumber, 
[S_MS].[dbo].[VoterTelephones].TelMatchScore,  
[S_MS].[dbo].[VoterTelephones].TelAreaCode

You can combine that data to figure out your needs for the index. You want predicate and join columns to be in the index columns and the other necessary columns to be included columns. Something like this could work:

CREATE INDEX NAME_YOUR_INDEX ON VoterTelephones (TelCellFlag, TelAreaCode, TelNumber) 
INCLUDE (LALVoterID, TelMatchScore);

You can perform a similar analysis for the other index scan in the plan (node ID 26).

Depending on your application, it's also possible that you're running into plan caching or statistics caching issues with your temp tables. I think that this is unlikely to be the cause but I'm including it in case other ideas don't work out. You can test for this by including a RECOMPILE hint in your query. Depending on how often this query runs that may not be a good option.