Sql-server – ny way I can speed up this large full-table query

query-performancesql server

I have a query that selects from only one table and with one WHERE filter. However it takes a very long time to execute and even times out occasionally. This is likely because it is filtering about 4 million rows out from a table of 13 million rows (the other 9 million records are older than 2019), and it is returning all of the columns, of which there are 101 (a mix of datetime, varchar, and int columns). It has two indexes, a clustered one on its primary key interaction_id, and an unclustered index on interaction_date which is a datetime column that is the main filter. This is the query:

  SELECT * 
  FROM [Sales].[dbo].[Interaction] 
  WHERE 
  year(Interaction_date) >= 2019

Is there anything obvious I can do to improve this query's performance by adding/tweaking indexes or tweaking the query itself? Before I go into an ETL processes or fight back on the group that needs this query (they are a Hadoop sqooping team who insist they need to sqoop all of these records all the time with all of the columns), I want to see if I can make it easier on people by doing something on my end as the DBA.

The query plan by default ignores my non-clustered index on the interaction_date column and still does a full clustered index scan. So I then tried forcing it to use it by including WITH (INDEX(IX_Interaction_Interaction_Date)) in the select.

This forces it into the query plan startign with an index scan of the non-clustered index, with estimated rows 4 million but estimated rows to be read as all 13 million. Then after a short time it spends the rest of the execution on the key lookup of the primary clustered index.

But ultimately, it doesn't seem to speed up the query at all.

Best Answer

Is there anything obvious I can do to improve this query's performance by adding/tweaking indexes or tweaking the query itself?

Yes. First make the predicate sargable.

SELECT * FROM 
[Sales].[dbo].[Interaction] 
WHERE Interaction_date >= '20190101'

And then consider partitioning, or a filtered index with included columns. But even if you have an index that can support this query as a simple seek+scan, sending all the columns to the client takes time.

Related Solutions

Sql-server – Optimize simple query in SQL Server

It has no WHERE clause so it must process and aggregate all 40 million rows. SQL Server will not take advantage of the index order and skip scan ahead to the next IdTag once it has found the MAX for the current group but will continue processing the other rows in that group. Each group has an average of about 30,000 rows.

As you have another table that lists the 1,386 distinct IdTag types then you could try the following instead.

SELECT D.IdTag,
       V.PCTimeStamp,
       V.Now,
       datediff(SECOND, V.PCTimeStamp, V.Now) AS DELAY
FROM   DescriptionTagsOPC D
       CROSS APPLY (SELECT TOP 1 *,
                                 getdate() AS Now
                    FROM   ValuesTagsOPC V
                    WHERE  D.IdTag = V.IdTag
                    ORDER  BY PCTimeStamp DESC) V

To replace the scan of 40 million rows with 1,386 seeks.

If that table was not available then a recursive CTE could be used to achieve similar results.

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 IdTag, PCTimeStamp
        FROM ValuesTagsOPC
        ORDER BY IdTag DESC, PCTimeStamp DESC
        UNION   ALL
        SELECT  R.IdTag, R.PCTimeStamp
        FROM    (
                SELECT  V.*,
                        rn = ROW_NUMBER() OVER (ORDER BY V.IdTag DESC, V.PCTimeStamp DESC)
                FROM    ValuesTagsOPC V
                JOIN    RecursiveCTE R
                        ON  V.IdTag < R.IdTag
                ) R
        WHERE   R.rn = 1
        )
SELECT  IdTag,
        PCTimeStamp,
        getdate()                                 AS NOW,
        datediff(SECOND, PCTimeStamp, getdate()) AS DELAY
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);

Sql-server – Improve query performance when selecting almost all rows with many “group by” columns

The non-clustered index you have tested is not the best for this query. It can be used for the WHERE clause and for doing an index scan instead of a full table scan but it cannot be used for the GROUP BY.

The best possible index would have to be a partial index (to filter the unwanted rows from the WHERE clause), then have all the columns used in the GROUP BY and then INCLUDE all the other columns used in the SELECT:

CREATE INDEX special_ix 
  ON dbo.Commissions_Output
    ( company, location, account, 
      salesroute, employee, producttype, 
      item, loadjdate, commissionrate ) 
INCLUDE 
  ( [Extended Sales Price], [Delivered Qty] ) 
WHERE 
  ( [Extended Sales Price] <> 0 ) ;

Best Answer

Related Solutions

Sql-server – Optimize simple query in SQL Server

Sql-server – Improve query performance when selecting almost all rows with many “group by” columns

Related Question