Sql-server – Optimize simple query in SQL Server

performancequery-performancesql server

This should be a quite easy query, but I honestly think its execution time can be improved.

select idTag,MAX(pctimestamp) AS PCTIMESTAMP,getdate() AS NOW, datediff(SECOND,MAX(pctimestamp),getdate()) AS DELAY 
from ValuesTagsOPC
group by IdTag

This query returns 1386 rows from 'ValuesTagsOPC' table, that contains about 40 million rows, and has the following structure, retrieved by the create script:

CREATE TABLE [dbo].[ValuesTagsOPC](
    [IdTag] [int] NOT NULL,
    [TTimeStamp] [datetime] NOT NULL,
    [PCTimeStamp] [datetime] NOT NULL,
    [Value] [nvarchar](50) NOT NULL,
    [Quality] [int] NOT NULL,
 CONSTRAINT [PK_ValuesTagsOPC] PRIMARY KEY CLUSTERED 
(
    [IdTag] ASC,
    [PCTimeStamp] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Obviously, there is a clustered index on its primary key.
Time and IO statistics from SQL Server are the following (sorry, it's in spanish):

(1386 filas afectadas)
Tabla 'ValuesTagsOPC'. Recuento de exámenes 5, lecturas lógicas 224612, lecturas físicas 0, lecturas anticipadas 0, lecturas lógicas de LOB 0, lecturas físicas de LOB 0, lecturas anticipadas de LOB 0.

 Tiempos de ejecución de SQL Server:
   Tiempo de CPU = 9578 ms, tiempo transcurrido = 3019 ms.

I've checked that the estimated execution plan is the same as the real one, and it tells me that 94 % of the cost comes from the cluster index scan. What I don't understand is why it is not performing a seek instead of a scan, since all required fields in the query are included in the clustered index….

Thanks in advance!!

Best Answer

It has no WHERE clause so it must process and aggregate all 40 million rows. SQL Server will not take advantage of the index order and skip scan ahead to the next IdTag once it has found the MAX for the current group but will continue processing the other rows in that group. Each group has an average of about 30,000 rows.

As you have another table that lists the 1,386 distinct IdTag types then you could try the following instead.

SELECT D.IdTag,
       V.PCTimeStamp,
       V.Now,
       datediff(SECOND, V.PCTimeStamp, V.Now) AS DELAY
FROM   DescriptionTagsOPC D
       CROSS APPLY (SELECT TOP 1 *,
                                 getdate() AS Now
                    FROM   ValuesTagsOPC V
                    WHERE  D.IdTag = V.IdTag
                    ORDER  BY PCTimeStamp DESC) V

To replace the scan of 40 million rows with 1,386 seeks.

If that table was not available then a recursive CTE could be used to achieve similar results.

WITH    RecursiveCTE
AS      (
        SELECT TOP 1 IdTag, PCTimeStamp
        FROM ValuesTagsOPC
        ORDER BY IdTag DESC, PCTimeStamp DESC
        UNION   ALL
        SELECT  R.IdTag, R.PCTimeStamp
        FROM    (
                SELECT  V.*,
                        rn = ROW_NUMBER() OVER (ORDER BY V.IdTag DESC, V.PCTimeStamp DESC)
                FROM    ValuesTagsOPC V
                JOIN    RecursiveCTE R
                        ON  V.IdTag < R.IdTag
                ) R
        WHERE   R.rn = 1
        )
SELECT  IdTag,
        PCTimeStamp,
        getdate()                                 AS NOW,
        datediff(SECOND, PCTimeStamp, getdate()) AS DELAY
FROM    RecursiveCTE
OPTION  (MAXRECURSION 0);

Related Solutions

Sql-server – Parent-Child Tree Hierarchical ORDER

OK, enough brain cells are dead.

SQL Fiddle

WITH cte AS
(
  SELECT 
    [ICFilterID], 
    [ParentID],
    [FilterDesc],
    [Active],
    CAST(0 AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters]
  WHERE [ParentID] = 0
  UNION ALL
  SELECT 
    i.[ICFilterID], 
    i.[ParentID],
    i.[FilterDesc],
    i.[Active],  
    Level + CAST(i.[ICFilterID] AS varbinary(max)) AS Level
  FROM [dbo].[ICFilters] i
  INNER JOIN cte c
    ON c.[ICFilterID] = i.[ParentID]
)

SELECT 
  [ICFilterID], 
  [ParentID],
  [FilterDesc],
  [Active]
FROM cte
ORDER BY [Level];

Sql-server – Comparing two queries in SQL Server 2012

I love your approach to careful consideration to query tuning and reviewing options and plans. I wish more developers did this. One caution would be - always test with a lot of rows, looking at the logical reads, this is a smallish table. Try and generate a sample load and run the query again. One small issue - in your top query you are not asking for an order by, in your bottom query you are. You should compare and contrast them each with ordering.

I just quickly created a SalesOrders table with 200,000 sales orders in it - still not huge by any stretch of the imagination. And ran the queries with the ORDER BY in each. I also played with indexes a bit.

With no clustered index on OrderID, just a non-clustered index on CustID The second query outperformed. Especially with the order by included in each. There was twice as many reads on the first query than the second query, and the cost percentages were 67% / 33% between the queries.

With a clustered index on OrderID and a non-clustered index just on CustID They performed in a similar speed and the exact same number of reads.

So I would suggest you increase the number of rows and do some more testing. But my final analysis on your queries -

You may find them behaving more similarly than you realize when you increase the rows, so keep that caveat in mind and test that way.

If all you ever want to return is the maximum OrderID for each Customer, and you want to determine that by the OrderID being the greatest OrderID then the second query out of these two is the best way to go from my mindset - it is a bit simpler and while ever so slightly more expensive based on subtree cost it is a quicker and easier to decipher statement. If you intend on adding other columns into your result set someday? Then the first query allows you do to do that.

Updated: One of your comments under your question was:

Please keep in mind, that finding the best query in this question is a means of refining the techniques used for comparing them.

But best takeaway for doing that- test with more data - always makes sure you have data consistent with production and expected future production. Query plans start looking data when you give more rows to the tables, and try and keep the distribution what you'd expect in production. And pay attention to things like including Order By or not, here I don't think it makes a terrible bit of difference in the end, but still worth digging into.

Your approach of comparing this level of detail and data is a good one. Subtree costs are arbitrary and meaningless mostly, but still worth at least looking at for comparison between edits/changes or even between queries. Looking at the time statistics and the IO are quite important, as is looking at the plan for anything that feels out of place for the size of the data you are working with and what you are trying to do.

Best Answer

Related Solutions

Sql-server – Parent-Child Tree Hierarchical ORDER

Sql-server – Comparing two queries in SQL Server 2012

Related Question