Sql-server – SQL Server – Prevent Clustered Index Scan on a specific table

execution-planquery-performancesql serversql-server-2016

My database contains a specific table that is quite large (250+M rows, 100+GB data space).

Generating a clustered index scan on this table is always a bad idea.

This table has multiple indexes and we never run a query on this table without specifying predicates on indexed columns.

Since we updated the server, we have experienced random query plan degradations. Queries that would run in a few seconds randomly start to timeout after multiple dozens of minutes.

Here are the server changes:

  • The compatibility level was updated from 2008 to 2016, thus the cardinality estimator changed.
  • SQL Server edition was updated from Standard to Enterprise.
  • SQL Server version (2016) was not changed.
  • The storage (SSD) was not changed.

Of course, I can easily solve this issue for one specific query by adding hints, but I have too many queries in my applications to broadly apply this solution.

Is there a way to globally prevent SQL Server from generating Clustered Index Scan on this table?

I was not able to reproduce the same bad query plans from my previous production issues (the bad query plans happen randomly), but I was able to generate similar query plans using a new simple query. I uploaded plans for the simple query without hint and with hint.

OPTION (RECOMPILE) could solve the issue. But it does not ensure that the new query plan will be better or that SQL Server will not switch to another query plan in a few days. I am not looking for a way to improve one particular query or to improve one particular query plan. I want to globally prevent Clustered Index Scan on my table.

I have many queries that query many different columns and I cannot include all columns in all indexes.

One global option that would probably solve my issues is to use the legacy cardinality estimator, but I hope to find something better.

Best Answer

Is there a way to globally prevent SQL Server from generating Clustered Index Scan on this table?

Yes. Replace the clustered index with a clustered columnstore index. It will be highly compressed. And scans can eliminate unneeded columns, and (critically here) unneeded row groups.

The query here is tricky to optimze:

select t.Id, t.ExternalId, t.Source, tn.Code
from XXX.LargeTable t
join Security tn on t.SecurityId = tn.Id
where t.TransactionTime >= @d1
and t.TransactionTime <= @d2
and tn.NationalityId = 22

The optimal plan depends almost entirely on how wide the range [@d1, @d2] is. But SQL Server can't have multiple plans for the same query. You're getting a table scan here as the alternative plan has a huge estimated cost, due to the double nested loop join. As it turns out if few enough rows qualify in the transaction table, that plan will be cheaper than the table scan, but SQL has to come up with a plan that works reasonably well for any values of @d1 and @d2.

If the date range turns out to be verry narrow, as here, the clustered index scan is a terrible plan, as it requires reading the whole table.

But in the case of a Clustered Columstore SQL Server only has to scan the TransactionTime column to find the matching rows. Not only that, each 1,000,000 row column segment has the min and max in the column segment header. So if the range is narrow many of the row groups can be eliminated without actually scanning all the TransactionTime values.

The "bad" plan had mainly IO waits

  <Wait WaitType="PAGEIOLATCH_SH" WaitTimeMs="523619" WaitCount="83735" />
  <Wait WaitType="MEMORY_ALLOCATION_EXT" WaitTimeMs="13910" WaitCount="14455586" />

Scanning a CCI for the same query would require a tiny, tiny fraction of the IO required for the clustered index scan.