Sql-server – SQL Server – Prevent Clustered Index Scan on a specific table

execution-planquery-performancesql serversql-server-2016

My database contains a specific table that is quite large (250+M rows, 100+GB data space).

Generating a clustered index scan on this table is always a bad idea.

This table has multiple indexes and we never run a query on this table without specifying predicates on indexed columns.

Since we updated the server, we have experienced random query plan degradations. Queries that would run in a few seconds randomly start to timeout after multiple dozens of minutes.

Here are the server changes:

The compatibility level was updated from 2008 to 2016, thus the cardinality estimator changed.
SQL Server edition was updated from Standard to Enterprise.
SQL Server version (2016) was not changed.
The storage (SSD) was not changed.

Of course, I can easily solve this issue for one specific query by adding hints, but I have too many queries in my applications to broadly apply this solution.

Is there a way to globally prevent SQL Server from generating Clustered Index Scan on this table?

I was not able to reproduce the same bad query plans from my previous production issues (the bad query plans happen randomly), but I was able to generate similar query plans using a new simple query. I uploaded plans for the simple query without hint and with hint.

OPTION (RECOMPILE) could solve the issue. But it does not ensure that the new query plan will be better or that SQL Server will not switch to another query plan in a few days. I am not looking for a way to improve one particular query or to improve one particular query plan. I want to globally prevent Clustered Index Scan on my table.

I have many queries that query many different columns and I cannot include all columns in all indexes.

One global option that would probably solve my issues is to use the legacy cardinality estimator, but I hope to find something better.

Best Answer

Is there a way to globally prevent SQL Server from generating Clustered Index Scan on this table?

Yes. Replace the clustered index with a clustered columnstore index. It will be highly compressed. And scans can eliminate unneeded columns, and (critically here) unneeded row groups.

The query here is tricky to optimze:

select t.Id, t.ExternalId, t.Source, tn.Code
from XXX.LargeTable t
join Security tn on t.SecurityId = tn.Id
where t.TransactionTime >= @d1
and t.TransactionTime <= @d2
and tn.NationalityId = 22

The optimal plan depends almost entirely on how wide the range [@d1, @d2] is. But SQL Server can't have multiple plans for the same query. You're getting a table scan here as the alternative plan has a huge estimated cost, due to the double nested loop join. As it turns out if few enough rows qualify in the transaction table, that plan will be cheaper than the table scan, but SQL has to come up with a plan that works reasonably well for any values of @d1 and @d2.

If the date range turns out to be verry narrow, as here, the clustered index scan is a terrible plan, as it requires reading the whole table.

But in the case of a Clustered Columstore SQL Server only has to scan the TransactionTime column to find the matching rows. Not only that, each 1,000,000 row column segment has the min and max in the column segment header. So if the range is narrow many of the row groups can be eliminated without actually scanning all the TransactionTime values.

The "bad" plan had mainly IO waits

  <Wait WaitType="PAGEIOLATCH_SH" WaitTimeMs="523619" WaitCount="83735" />
  <Wait WaitType="MEMORY_ALLOCATION_EXT" WaitTimeMs="13910" WaitCount="14455586" />

Scanning a CCI for the same query would require a tiny, tiny fraction of the IO required for the clustered index scan.

Related Solutions

Sql-server – Clustered Index Update on Nonclustered Columns

When you have a clustered index on a table, the clustered index IS the table!

Mentally you can substitute "table" for "clustered index" in this instance and it will make sense.

The data for every field in every row is in your clustered index. The clustered index just sets the order of the physical pages in the database to be organized by your clustering key(s).

You can always fall back on the phone book analogy for these things, too: in your classic phone book, the data is clustered on Last Name, First Name. Each entry still has PhoneNum, Address at the leaf level but you don't order by that. The pages are in physical order by the the keys.

I can't advise on optimizing the query unless you show us the table and query you are running, but basically this cost will be paid one way or another. If you don't update the clustered index it will be a table update and a table scan.

Sql-server – Why clustered index scan

This table is very small!

It has 20 rows of which 2 match the search condition. The table definition contains three columns and two indexes (which both support uniqueness constraints).

CREATE TABLE Person.ContactType(
    ContactTypeID int IDENTITY(1,1) NOT NULL,
    Name dbo.Name NOT NULL,
    ModifiedDate datetime NOT NULL,
    CONSTRAINT PK_ContactType_ContactTypeID PRIMARY KEY CLUSTERED(ContactTypeID),
    CONSTRAINT AK_ContactType_Name UNIQUE NONCLUSTERED(Name)
)

Running

SELECT index_type_desc,
       index_depth,
       page_count,
       avg_page_space_used_in_percent,
       avg_record_size_in_bytes
FROM   sys.dm_db_index_physical_stats(db_id(), 
                                      object_id('Person.ContactType'), 
                                      NULL, 
                                      NULL, 
                                      'DETAILED')

Shows both indexes only consist of a single leaf page with no upper level pages.

+--------------------+-------------+------------+--------------------------------+--------------------------+
|  index_type_desc   | index_depth | page_count | avg_page_space_used_in_percent | avg_record_size_in_bytes |
+--------------------+-------------+------------+--------------------------------+--------------------------+
| CLUSTERED INDEX    |           1 |          1 | 15.9130219915987               | 62.5                     |
| NONCLUSTERED INDEX |           1 |          1 | 13.1949592290586               | 51.5                     |
+--------------------+-------------+------------+--------------------------------+--------------------------+

Rows on each index page aren't necessarily in index key order but each page has a slot array with the offset of each row on the page. This is maintained in index order.

The nonclustered index covers two out of the three columns (Name as a key column and ContactTypeID as a row locator back to the base table) but is missing ModifiedDate.

You can use index hints to force the NCI seek as below

SELECT ct.*
FROM   Person.ContactType AS ct WITH (INDEX = AK_ContactType_Name)
WHERE  ct.Name LIKE 'Own%';

But you can see that under SQL Server's cost model this plan is given a higher estimated cost than the competing CI scan (roughly double).

enter image description here

The single page clustered index scan would just need to read all the 20 rows on the page, evaluate the predicate against them and return them.

The single page nonclustered index range seek might potentially be able to perform a binary search on the slot array to reduce the number of rows evaluated however the index does not cover the query so it would also need a potential IO to retrieve the CI page and then it would still need to locate the row with the missing column values on there (for each row returned by the NCI seek).

On my machine running 1 million iterations of the non clustered index plan took 15.245 seconds compared to 11.113 seconds for the clustered index plan. Whilst this is far from double the plan without the hint was measurably faster.

Even if the table was orders of magnitude larger however you may well still not get your expected plan with lookups.

SQL Server's costing model prefers sequential scans to random IO lookups and the "tipping point" between it choosing a scan of a covering index or a seek and lookups of a non covering one is often surprisingly low as discussed in Kimberley Tripp's blog post here.

It is certainly not out of the question that it would choose such a plan for a 10% selective predicate but the clustered index would likely need to be quite a lot wider than the NCI for it to do so.

Best Answer

Related Solutions

Sql-server – Clustered Index Update on Nonclustered Columns

Sql-server – Why clustered index scan

Related Question