I have a query that selects from only one table and with one WHERE
filter. However it takes a very long time to execute and even times out occasionally. This is likely because it is filtering about 4 million rows out from a table of 13 million rows (the other 9 million records are older than 2019), and it is returning all of the columns, of which there are 101 (a mix of datetime
, varchar
, and int
columns). It has two indexes, a clustered one on its primary key interaction_id
, and an unclustered index on interaction_date
which is a datetime column that is the main filter. This is the query:
SELECT *
FROM [Sales].[dbo].[Interaction]
WHERE
year(Interaction_date) >= 2019
Is there anything obvious I can do to improve this query's performance by adding/tweaking indexes or tweaking the query itself? Before I go into an ETL processes or fight back on the group that needs this query (they are a Hadoop sqooping team who insist they need to sqoop all of these records all the time with all of the columns), I want to see if I can make it easier on people by doing something on my end as the DBA.
The query plan by default ignores my non-clustered index on the interaction_date
column and still does a full clustered index scan. So I then tried forcing it to use it by including WITH (INDEX(IX_Interaction_Interaction_Date))
in the select.
This forces it into the query plan startign with an index scan of the non-clustered index, with estimated rows 4 million but estimated rows to be read as all 13 million. Then after a short time it spends the rest of the execution on the key lookup of the primary clustered index.
But ultimately, it doesn't seem to speed up the query at all.
Best Answer
Yes. First make the predicate sargable.
And then consider partitioning, or a filtered index with included columns. But even if you have an index that can support this query as a simple seek+scan, sending all the columns to the client takes time.