Sql-server – ny way I can speed up this large full-table query

query-performancesql server

I have a query that selects from only one table and with one WHERE filter. However it takes a very long time to execute and even times out occasionally. This is likely because it is filtering about 4 million rows out from a table of 13 million rows (the other 9 million records are older than 2019), and it is returning all of the columns, of which there are 101 (a mix of datetime, varchar, and int columns). It has two indexes, a clustered one on its primary key interaction_id, and an unclustered index on interaction_date which is a datetime column that is the main filter. This is the query:

  SELECT * 
  FROM [Sales].[dbo].[Interaction] 
  WHERE 
  year(Interaction_date) >= 2019

Is there anything obvious I can do to improve this query's performance by adding/tweaking indexes or tweaking the query itself? Before I go into an ETL processes or fight back on the group that needs this query (they are a Hadoop sqooping team who insist they need to sqoop all of these records all the time with all of the columns), I want to see if I can make it easier on people by doing something on my end as the DBA.

The query plan by default ignores my non-clustered index on the interaction_date column and still does a full clustered index scan. So I then tried forcing it to use it by including WITH (INDEX(IX_Interaction_Interaction_Date)) in the select.

This forces it into the query plan startign with an index scan of the non-clustered index, with estimated rows 4 million but estimated rows to be read as all 13 million. Then after a short time it spends the rest of the execution on the key lookup of the primary clustered index.

But ultimately, it doesn't seem to speed up the query at all.

Best Answer

Is there anything obvious I can do to improve this query's performance by adding/tweaking indexes or tweaking the query itself?

Yes. First make the predicate sargable.

SELECT * FROM 
[Sales].[dbo].[Interaction] 
WHERE Interaction_date >= '20190101'

And then consider partitioning, or a filtered index with included columns. But even if you have an index that can support this query as a simple seek+scan, sending all the columns to the client takes time.