I have a performance bottleneck with a SELECT GROUP BY
operation.
Schema
I have a table like this:
CREATE TABLE [InverterData](
[InverterID] [bigint] NOT NULL,
[TimeStamp] [datetime] NOT NULL,
[ValueA] [decimal](18, 2) NULL,
[ValueB] [decimal](18, 2) NULL
CONSTRAINT [PrimaryKey_e149e28f-5754-4229-be01-65fafeebce16] PRIMARY KEY CLUSTERED
(
[TimeStamp] DESC,
[InverterID] ASC
) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF
, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON
, ALLOW_PAGE_LOCKS = ON)
)
and a Index
like this:
CREATE NONCLUSTERED INDEX [TimeStamp_Power-NonClusteredIndex] ON [dbo].[InverterData]
(
[InverterID] ASC,
[TimeStamp] ASC
)
INCLUDE
(
[ValueA],
[ValueB]
)
The [InverterData]
table as has following storage stats:
- Data space: 26,901.86 MB
- Row count: 131,827,749
- Partitioned: true
- Partition count: 62
Usage
With my descripted schema (one plus some extra table that are not important for my question) I can run super fast queries like this:
SELECT [TimeStamp], [ValueA], [ValueB]
FROM [InverterData]
JOIN [Inverter] ON [Inverter].[ID] = [InverterData].[InverterID]
JOIN [DataLogger] ON [DataLogger].[ID] = [Inverter].[DataLoggerID]
WHERE [DataLogger].[ProjectID] = 20686
AND [InverterData].[TimeStamp] >= '20160108'
AND [InverterData].[TimeStamp] < '20160109'
Excecution timespan: 178ms
Excecution plan:
Problem
I now want to make a SELECT GROUP BY
by [InverterID] and 15 minute interval of [TimeStamp].
Somethink like this:
SELECT [InverterID]
, DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [TimeStamp] ) / 15 * 15, 0) AS [TimeStamp]
, SUM([ValueA]), SUM([ValueB])
FROM [InverterData]
JOIN [Inverter] ON [Inverter].[ID] = [InverterData].[InverterID]
JOIN [DataLogger] ON [DataLogger].[ID] = [Inverter].[DataLoggerID]
WHERE [DataLogger].[ProjectID] = 20686
AND [InverterData].[TimeStamp] >= '20160107'
AND [InverterData].[TimeStamp] < '20160108'
GROUP BY
[InverterID], DATEADD(MINUTE, DATEDIFF(MINUTE, 0, [InverterData].[TimeStamp] ) / 15 * 15, 0)
Excecution timespan: 4637ms
Excecution plan:
Attempts
I think it can be related to the necessary Sort
operation here:
As far as I know it is possible to avoid this SORT
by creating a matching indexer. But I don't know how to do this with my 15 minute interval grouping.
Question
As you can see the excecution timespan of the SELECT GROUP BY
is massivly longer. But I don't know where and how to avoid the bottleneck!?
Update 1 (related to @Max Vernon answer)
If it is possible ide like to have a faster solution where I can flexible change the interval (for example 10, 15 or 6o minutes). So without calculated columns.
Best Answer
You could add a calculated column to the table and build an index from the calculation.
For instance, the table would be:
I've modified the name of the
TimeStamp
column toTS
sinceTimeStamp
is a reserved word.After looking at Paul's comment; I think a good index might be:
If you change the
WHERE
clause to operate on the indexed timestamp column, I get no sort in the plan. I figure the change to thewhere
clause is likely not a problem since you appear to be selecting on whole days.Granted, I've not included the other tables in my example, since you have not provided those details.
The plan for the query above is, with 100,000 rows in my sample table:
Having said all that, without the actual table definitions, including the partitioning scheme, etc, it is difficult to tell how well this will actually work for you.
Since adding a (non-persisted) computed column is a meta-data only operation, modifying the table should be nearly instantaneous. You'd still likely want to do this when you know there are no other transactions (or as few as possible) occurring so the required schema lock (although very short lived) won't block. The DDL to modify the table is: