I have a query that I want to speed up:
SELECT
sum(case when FlagDTD = 1 then Success else 0 end) as SuccessDTD
, sum(case when FlagDTD = 1 then [Error] else 0 end) as ErrorDTD
, round(sum(case when FlagDTD = 1 then Success else 0 end) * 100.0 / sum(FlagDTD),2)
as RateDTD
, sum(case when FlagYTD = 1 then Success else 0 end) as SuccessYTD
, sum(case when FlagYTD = 1 then [Error] else 0 end) as ErrorYTD
, round(sum(case when FlagYTD = 1 then Success else 0 end) * 100.0 / sum(FlagYTD),2)
as RateYTD
FROM
(
SELECT
CASE WHEN Message = 'OK' then 1 else 0 end as Success
, CASE WHEN Message <> 'OK' then 1 else 0 end as [Error]
, CASE WHEN DateCreated >
dateadd(HOUR, datediff(hh,GetUTCDate(), GetDate())*-1, DATEADD(yy,
DATEDIFF(yy,0,getdate()), 0)) then 1 else 0 end as FlagYTD
, CASE WHEN DateCreated >
dateadd(HOUR, datediff(hh,GetUTCDate(), GetDate())*-1 ,
convert(varchar(10), getdate(), 101)) then 1 else 0 end as FlagDTD
FROM
[Channels4].[dbo].[NotificationResult]
) Cnts
I thought maybe I could create a view or indexed view based on the subquery. However, upon testing I can't create an indexed view because "the view uses an implicit conversion from string to datetime or smalldatetime".
I tried using a traditional view, but that gave no performance enhancement at all. My next thought would be to maybe rewrite the entire query. What are everyone's thoughts?
Plan:
https://www.brentozar.com/pastetheplan/?id=rJqGY7iZW
Table Structure:
CREATE TABLE [dbo].[NotificationResult]
(
[IdNotificationResult] [bigint] IDENTITY(1,1) NOT NULL,
[ApplicationGuid] [nvarchar](48) NOT NULL,
[MessageGuid] [nvarchar](48) NOT NULL,
[IdNotificationResultTypeStatus] [int] NOT NULL,
[MessageStatusCode] [int] NULL,
[Message] [varchar](max) NULL,
[ExceptionStatusCode] [int] NULL,
[ExceptionMessage] [varchar](max) NULL,
[Subject] [varchar](max) NULL,
[From] [varchar](max) NULL,
[Timestamp] [datetime] NOT NULL,
[IdCreatedBy] [bigint] NOT NULL,
[IdLastUpdatedBy] [bigint] NOT NULL,
[DateCreated] [datetime] NOT NULL,
[DateLastUpdated] [datetime] NOT NULL,
CONSTRAINT [PK_NotificationResult] PRIMARY KEY CLUSTERED
(
[IdNotificationResult]
)
);
CREATE NONCLUSTERED INDEX [IX_NotificationResult_DateMessage]
ON [dbo].[NotificationResult] ( [DateCreated] ASC ) INCLUDE ( [Message]);
Doing a quick count for YTD, and DTD, I get this number for both: 11739267. The number of rows that are 'OK': 11782564.
Best Answer
As written you technically haven't asked a question here. I assume that you want to improve the performance of your query, but keep in mind that defining an acceptable response time is sometimes an important part of performance tuning. If a query runs once per day and takes a minute to finish is it really worth 8 hours of your time to make it run in 1 second?
More important than performance is correctness. It doesn't matter so much how long the query takes if it returns the wrong results, although of course taking a long time to return the wrong results is worse than taking a short time to return the wrong results. Depending on your time zone the UTC conversion stuff might not work out as you expected it. If there's any data affected by daylight savings time then you can't use the current hour difference between local time and UTC time to convert old data.
Setting all of that aside, I'm going to try to show you a few ways to speed up the query in the question. You have a covering index which is a good start, especially because it avoids reading the unrelated blob data. However there are still ways to speed up the query. I'm deliberately ignoring the clues that you gave about data distribution because I want to make this a more general answer that could help others and perhaps could be more helpful to you if your data changes in the future.
I mocked up 10 million rows with half of them having "OK" for the message and the other half having a long string. The dates are spread out over a few years. WARNING: this code takes up around 60 GB of space and ran in around 10 minutes on my machine.
If I run the query in the question I get the same query plan as you. It took 38 seconds. Here are some performance stats for the execution:
The first opportunity to improve performance is that you have an implied
WHERE
clause predicate in yourCASE
statements. The query optimizer isn't smart enough to realize that any rows from previous years won't contribute to the totals. We know that 24 hours will always be longer than the difference between local time and UTC time so adding a filter like this shouldn't change the results:Now instead of reading and aggregating 10 million rows from the index, SQL Server only has to process 1.4 million rows. The savings that you get with this optimization will depend on how the data is distributed in the plan. If all of your data is in the current year then performance won't improve yet. For my data, the query now finishes in 5 seconds and performance is greatly improved:
We can do better than that. We're storing a
VARCHAR(MAX)
column in the index when really all that we need to know is if the column value matches "OK" or not. Without changing the table definition we can create smaller indexes to seek or scan against by creating three filtered indexes:The idea here is that these indexes have the data that we need but are much smaller on disk than the existing
IX_NotificationResult_DateMessage
index. Getting the query optimizer to use the filtered indexes required a query rewrite and an index hint (not sure why). Here's one way to rewrite the query:Now the query finishes in less than a second:
(the plan below is missing one of the indexes)
It's true that we read more rows from the indexes than before but the indexes are in total about 100 times smaller than the original index.
If that query still isn't fast enough you can consider an indexed view. If you have a UTC date column in the table then it is straightforward to create a view which is eligible to be indexed:
I believe that this query roughly captures your intent although I probably got some details wrong:
It finishes in 66 ms: