Sql-server – Clustered Index Scan on 5+ billion rows instead of using non clusterd index for <1000 rows

nonclustered-indexnosqloptimizationperformancequery-performancesql-server-2017

We have a huge SQL 2017 database containing more than 5 billion internetpages. One of the table contains the info about the page: PageID is the primary clustered key and UriKey is the non clustered index so we can lookup based on the Uri of the page. There are a lot more columns in that table as well, related to that page.

If we join with other tables and we join on the UriKey, the query will use the non clustered index only if we only require the UriKey itself. Once we add another column, it does a clustered index scan, wich can take up to days to complete.

This uses the correct index and takes less than a second to complete
https://www.brentozar.com/pastetheplan/?id=rkk1vIb07

select kps.Page_ID, p.PageID, r.*, p2.UriKey
from LB_Keywords_Pages_Static kps
    inner join spidir2.dbo.se_pages p 
    on kps.page_id=p.pageid
    inner join Temp_KeyboostRedirects r
    on p.UriKey like '%/'+ r.source +'/%'
    inner join Spidir2.dbo.SE_Pages p2 
    on p2.UriKey='https://www.keyboost.nl'+ r.target
where Account_ID=-2147482063

This selects one column extra (p2.Domain_ID) and takes more than a day to complete https://www.brentozar.com/pastetheplan/?id=ByOpUIb0m

select kps.Page_ID, p.PageID, r.*, p2.UriKey, p2.Domain_ID
from LB_Keywords_Pages_Static kps
    inner join spidir2.dbo.se_pages p 
    on kps.page_id=p.pageid
    inner join Temp_KeyboostRedirects r
    on p.UriKey like '%/'+ r.source +'/%'
    inner join Spidir2.dbo.SE_Pages p2 
    on p2.UriKey='https://www.keyboost.nl'+ r.target
where Account_ID=-2147482063

When we specify the index to be used, it works like a charm https://www.brentozar.com/pastetheplan/?id=SkvkKIbAm

select kps.Page_ID, p.PageID, r.*, p2.UriKey, p2.Domain_ID
from LB_Keywords_Pages_Static kps
    inner join spidir2.dbo.se_pages p 
    on kps.page_id=p.pageid
    inner join Temp_KeyboostRedirects r
    on p.UriKey like '%/'+ r.source +'/%'
    inner join Spidir2.dbo.SE_Pages p2 with (index(SE_Pages_UriKey))
    on p2.UriKey='https://www.keyboost.nl'+ r.target
where Account_ID=-2147482063

I noticed this strange behavior also with other keys and queries. I didn't notice this behavior before when we were using SQL 2016. This is causing our database to become very slow.

If we really have to specify every time the index to be used, wouldn't it be better to move to a nosql database?

Best Answer

Sometimes the person writing T-SQL has more accurate information than the query optimizer. In all of your query plans, SQL Server has a final cardinality estimate of 5360890000 rows, which is how many rows are in the dbo.SE_Pages table. Only a single row is ever returned though, and in the comments you said:

most of the time it['s result set] is only a selection of a few thousand

Something is going very wrong with cardinality estimation. From the optimizer's point of view, it's giving you the best plan that it can. It thinks that all plan options are expensive because billions of rows need to be returned. When returning all rows, in general, a clustered index scan will be costed much cheaper than using an NCI + key lookups to return every single row. The bad cardinality estimate results in a query plan that does the scan which takes days. The key to getting better performance without index hints is to fix the cardinality estimate.

You said that this problem seemed to start happening after you upgraded from SQL Server 2016 to SQL Server 2017. Did you gather statistics after the upgrade? The two most likely causes are that there's something wrong with the statistics for dbo.SE_Pages or Microsoft made a change in SQL Server 2017 that's harmful to your workload.

You can test if statistics are the cause by updating them for that table. You can test if SQL Server 2017 is the problem by changing the compatibility level of the relevant database to 130, although this could have other effects on your application. If you need another temporary workaround that doesn't involve an index try FAST 1. That hint encourages the optimizer to prefer plans which return the first row as quickly as possible. In your case, it's a way to disfavor plans that do a scan. If you need additional help you'll need to add more information to your question. The histogram for UriKey and the table structure (along with an explanation as to why you're partitioning by UriKey) are likely to be helpful.