SQL Server – How to Resolve Aggregation GROUP BY Bottleneck

execution-planindexsql serversql-server-2008

I have a performance bottleneck with a SELECT GROUP BY operation.

Schema

I have a table like this:

CREATE TABLE [InverterData](
    [InverterID] [bigint] NOT NULL,
    [TimeStamp] [datetime] NOT NULL,    
    [ValueA] [decimal](18, 2) NULL,
    [ValueB] [decimal](18, 2) NULL
    CONSTRAINT [PrimaryKey_e149e28f-5754-4229-be01-65fafeebce16] PRIMARY KEY CLUSTERED 
    (
        [TimeStamp] DESC,
        [InverterID] ASC
    ) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
    , IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON
    , ALLOW_PAGE_LOCKS = ON)
)

and a Index like this:

CREATE NONCLUSTERED INDEX [TimeStamp_Power-NonClusteredIndex] ON [dbo].[InverterData]
(
    [InverterID] ASC,
    [TimeStamp] ASC
)
INCLUDE 
(   
    [ValueA],
    [ValueB]
)

The [InverterData] table as has following storage stats:

  • Data space: 26,901.86 MB
  • Row count: 131,827,749
  • Partitioned: true
  • Partition count: 62

Usage

With my descripted schema (one plus some extra table that are not important for my question) I can run super fast queries like this:

SELECT [TimeStamp], [ValueA], [ValueB]

FROM [InverterData]

JOIN [Inverter] ON [Inverter].[ID] = [InverterData].[InverterID]
JOIN [DataLogger] ON [DataLogger].[ID] = [Inverter].[DataLoggerID]

WHERE [DataLogger].[ProjectID] = 20686
 AND [InverterData].[TimeStamp] >= '20160108'   
 AND [InverterData].[TimeStamp] < '20160109'

Excecution timespan: 178ms

Excecution plan:

enter image description here

Problem

I now want to make a SELECT GROUP BY by [InverterID] and 15 minute interval of [TimeStamp].

Somethink like this:

SELECT [InverterID]
 , DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [TimeStamp] ) / 15 * 15, 0) AS [TimeStamp]
 , SUM([ValueA]), SUM([ValueB])

FROM [InverterData]

JOIN [Inverter] ON [Inverter].[ID] = [InverterData].[InverterID]
JOIN [DataLogger] ON [DataLogger].[ID] = [Inverter].[DataLoggerID]

WHERE [DataLogger].[ProjectID] = 20686
 AND [InverterData].[TimeStamp] >= '20160107'
 AND [InverterData].[TimeStamp] < '20160108'

GROUP BY
 [InverterID], DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [InverterData].[TimeStamp] ) / 15 * 15, 0)

Excecution timespan: 4637ms

Excecution plan:

enter image description here

Attempts

I think it can be related to the necessary Sort operation here:

enter image description here

As far as I know it is possible to avoid this SORT by creating a matching indexer. But I don't know how to do this with my 15 minute interval grouping.

Question

As you can see the excecution timespan of the SELECT GROUP BY is massivly longer. But I don't know where and how to avoid the bottleneck!?

Update 1 (related to @Max Vernon answer)

If it is possible ide like to have a faster solution where I can flexible change the interval (for example 10, 15 or 6o minutes). So without calculated columns.

Best Answer

You could add a calculated column to the table and build an index from the calculation.

For instance, the table would be:

CREATE TABLE dbo.InverterData
(
     InverterID bigint NOT NULL
    , TS datetime NOT NULL    
    , ValueA decimal(18, 2) NULL
    , ValueB decimal(18, 2) NULL
    , TS15 AS (DATEADD(MINUTE, DATEDIFF(MINUTE, 0, TS ) / 15 * 15, 0))
    , CONSTRAINT PK_InverterData PRIMARY KEY CLUSTERED 
    (
        TS DESC,
        InverterID ASC
    ) WITH (
        PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
        , IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON
        , ALLOW_PAGE_LOCKS = ON
        )
);

I've modified the name of the TimeStamp column to TS since TimeStamp is a reserved word.

After looking at Paul's comment; I think a good index might be:

CREATE INDEX IX_InverterData_ID_TS151 
ON dbo.InverterData (TS15, InverterID);

If you change the WHERE clause to operate on the indexed timestamp column, I get no sort in the plan. I figure the change to the where clause is likely not a problem since you appear to be selecting on whole days.

SELECT InverterData.InverterID
    , InverterData.TS15
FROM InverterData
WHERE InverterData.TS15 >= '20140107'
    AND InverterData.TS15 < '20160108'
GROUP BY InverterData.InverterID
    , InverterData.TS15
OPTION (RECOMPILE);

Granted, I've not included the other tables in my example, since you have not provided those details.

The plan for the query above is, with 100,000 rows in my sample table:

enter image description here

Having said all that, without the actual table definitions, including the partitioning scheme, etc, it is difficult to tell how well this will actually work for you.

Since adding a (non-persisted) computed column is a meta-data only operation, modifying the table should be nearly instantaneous. You'd still likely want to do this when you know there are no other transactions (or as few as possible) occurring so the required schema lock (although very short lived) won't block. The DDL to modify the table is:

ALTER TABLE dbo.InverterData
ADD TS15 AS (DATEADD(MINUTE, DATEDIFF(MINUTE, 0, TS ) / 15 * 15, 0));