Sql-server – Optimize simple query in SQL Server

performancequery-performancesql server

This should be a quite easy query, but I honestly think its execution time can be improved.

select idTag,MAX(pctimestamp) AS PCTIMESTAMP,getdate() AS NOW, datediff(SECOND,MAX(pctimestamp),getdate()) AS DELAY 
from ValuesTagsOPC
group by IdTag

This query returns 1386 rows from 'ValuesTagsOPC' table, that contains about 40 million rows, and has the following structure, retrieved by the create script:

CREATE TABLE [dbo].[ValuesTagsOPC](
    [IdTag] [int] NOT NULL,
    [TTimeStamp] [datetime] NOT NULL,
    [PCTimeStamp] [datetime] NOT NULL,
    [Value] [nvarchar](50) NOT NULL,
    [Quality] [int] NOT NULL,
 CONSTRAINT [PK_ValuesTagsOPC] PRIMARY KEY CLUSTERED 
(
    [IdTag] ASC,
    [PCTimeStamp] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Obviously, there is a clustered index on its primary key.
Time and IO statistics from SQL Server are the following (sorry, it's in spanish):

(1386 filas afectadas)
Tabla 'ValuesTagsOPC'. Recuento de exámenes 5, lecturas lógicas 224612, lecturas físicas 0, lecturas anticipadas 0, lecturas lógicas de LOB 0, lecturas físicas de LOB 0, lecturas anticipadas de LOB 0.

 Tiempos de ejecución de SQL Server:
   Tiempo de CPU = 9578 ms, tiempo transcurrido = 3019 ms.

I've checked that the estimated execution plan is the same as the real one, and it tells me that 94 % of the cost comes from the cluster index scan. What I don't understand is why it is not performing a seek instead of a scan, since all required fields in the query are included in the clustered index….

Thanks in advance!!

Best Answer

It has no WHERE clause so it must process and aggregate all 40 million rows. SQL Server will not take advantage of the index order and skip scan ahead to the next IdTag once it has found the MAX for the current group but will continue processing the other rows in that group. Each group has an average of about 30,000 rows.

As you have another table that lists the 1,386 distinct IdTag types then you could try the following instead.

SELECT D.IdTag,
       V.PCTimeStamp,
       V.Now,
       datediff(SECOND, V.PCTimeStamp, V.Now) AS DELAY
FROM   DescriptionTagsOPC D
       CROSS APPLY (SELECT TOP 1 *,
                                 getdate() AS Now
                    FROM   ValuesTagsOPC V
                    WHERE  D.IdTag = V.IdTag
                    ORDER  BY PCTimeStamp DESC) V 

To replace the scan of 40 million rows with 1,386 seeks.

If that table was not available then a recursive CTE could be used to achieve similar results.

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 IdTag, PCTimeStamp
        FROM ValuesTagsOPC
        ORDER BY IdTag DESC, PCTimeStamp DESC
        UNION   ALL
        SELECT  R.IdTag, R.PCTimeStamp
        FROM    (
                SELECT  V.*,
                        rn = ROW_NUMBER() OVER (ORDER BY V.IdTag DESC, V.PCTimeStamp DESC)
                FROM    ValuesTagsOPC V
                JOIN    RecursiveCTE R
                        ON  V.IdTag < R.IdTag
                ) R
        WHERE   R.rn = 1
        )
SELECT  IdTag,
        PCTimeStamp,
        getdate()                                 AS NOW,
        datediff(SECOND, PCTimeStamp, getdate()) AS DELAY
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);