Seeking help to improve this query performance.
SQL Server 2008 R2 Enterprise, Max RAM 16 GB , CPU 40, Max Degree of Parallelism 4.
SELECT DsJobStat.JobName AS JobName
, AJF.ApplGroup AS GroupName
, DsJobStat.JobStatus AS JobStatus
, AVG(CAST(DsJobStat.ElapsedSec AS FLOAT)) AS ElapsedSecAVG
, AVG(CAST(DsJobStat.CpuMSec AS FLOAT)) AS CpuMSecAVG
FROM DsJobStat, AJF
WHERE DsJobStat.NumericOrderNo=AJF.OrderNo
AND DsJobStat.Odate=AJF.Odate
AND DsJobStat.JobName NOT IN( SELECT [DsAvg].JobName FROM [DsAvg] )
GROUP BY DsJobStat.JobName
, AJF.ApplGroup
, DsJobStat.JobStatus
HAVING AVG(CAST(DsJobStat.ElapsedSec AS FLOAT)) <> 0;
Execution message,
(0 row(s) affected)
Table 'AJF'. Scan count 11, logical reads 45, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DsAvg'. Scan count 2, logical reads 1926, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DsJobStat'. Scan count 1, logical reads 3831235, physical reads 85, read-ahead reads 3724396, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 67268 ms, elapsed time = 90206 ms.
Tables' structure:
-- 212271023 rows
CREATE TABLE [dbo].[DsJobStat](
[OrderID] [nvarchar](8) NOT NULL,
[JobNo] [int] NOT NULL,
[Odate] [datetime] NOT NULL,
[TaskType] [nvarchar](255) NULL,
[JobName] [nvarchar](255) NOT NULL,
[StartTime] [datetime] NULL,
[EndTime] [datetime] NULL,
[NodeID] [nvarchar](255) NULL,
[GroupName] [nvarchar](255) NULL,
[CompStat] [int] NULL,
[RerunCounter] [int] NOT NULL,
[JobStatus] [nvarchar](255) NULL,
[CpuMSec] [int] NULL,
[ElapsedSec] [int] NULL,
[StatusReason] [nvarchar](255) NULL,
[NumericOrderNo] [int] NULL,
CONSTRAINT [PK_DsJobStat] PRIMARY KEY CLUSTERED
( [OrderID] ASC,
[JobNo] ASC,
[Odate] ASC,
[JobName] ASC,
[RerunCounter] ASC
));
-- 48992126 rows
CREATE TABLE [dbo].[AJF](
[JobName] [nvarchar](255) NOT NULL,
[JobNo] [int] NOT NULL,
[OrderNo] [int] NOT NULL,
[Odate] [datetime] NOT NULL,
[SchedTab] [nvarchar](255) NULL,
[Application] [nvarchar](255) NULL,
[ApplGroup] [nvarchar](255) NULL,
[GroupName] [nvarchar](255) NULL,
[NodeID] [nvarchar](255) NULL,
[Memlib] [nvarchar](255) NULL,
[Memname] [nvarchar](255) NULL,
[CreationTime] [datetime] NULL,
CONSTRAINT [AJF$PrimaryKey] PRIMARY KEY CLUSTERED
( [JobName] ASC,
[JobNo] ASC,
[OrderNo] ASC,
[Odate] ASC
));
-- 413176 rows
CREATE TABLE [dbo].[DsAvg](
[JobName] [nvarchar](255) NULL,
[GroupName] [nvarchar](255) NULL,
[JobStatus] [nvarchar](255) NULL,
[ElapsedSecAVG] [float] NULL,
[CpuMSecAVG] [float] NULL
);
CREATE NONCLUSTERED INDEX [DJS_Dashboard_2] ON [dbo].[DsJobStat]
( [JobName] ASC,
[Odate] ASC,
[StartTime] ASC,
[EndTime] ASC
)
INCLUDE ( [OrderID],
[JobNo],
[NodeID],
[GroupName],
[JobStatus],
[CpuMSec],
[ElapsedSec],
[NumericOrderNo]) ;
CREATE NONCLUSTERED INDEX [Idx_Dashboard_AJF] ON [dbo].[AJF]
( [OrderNo] ASC,
[Odate] ASC
)
INCLUDE ( [SchedTab],
[Application],
[ApplGroup]) ;
CREATE NONCLUSTERED INDEX [DsAvg$JobName] ON [dbo].[DsAvg]
( [JobName] ASC
)
Execution plan:
https://www.brentozar.com/pastetheplan/?id=rkUVhMlXM
Update after get answered
Thank you so much @Joe Obbish
You're right about the issue of this query which is about between DsJobStat and DsAvg. It is not much about how to JOIN and not use NOT IN.
There is indeed a table as you guessed.
CREATE TABLE [dbo].[DSJobNames](
[JobName] [nvarchar](255) NOT NULL,
CONSTRAINT [DSJobNames$PrimaryKey] PRIMARY KEY CLUSTERED
( [JobName] ASC
) );
I tried your suggestion,
SELECT DsJobStat.JobName AS JobName
, AJF.ApplGroup AS GroupName
, DsJobStat.JobStatus AS JobStatus
, AVG(CAST(DsJobStat.ElapsedSec AS FLOAT)) AS ElapsedSecAVG
, Avg(CAST(DsJobStat.CpuMSec AS FLOAT)) AS CpuMSecAVG
FROM DsJobStat
INNER JOIN DSJobNames jn
ON jn.[JobName]= DsJobStat.[JobName]
INNER JOIN AJF
ON DsJobStat.Odate=AJF.Odate
AND DsJobStat.NumericOrderNo=AJF.OrderNo
WHERE NOT EXISTS ( SELECT 1 FROM [DsAvg] WHERE jn.JobName = [DsAvg].JobName )
GROUP BY DsJobStat.JobName, AJF.ApplGroup, DsJobStat.JobStatus
HAVING AVG(CAST(DsJobStat.ElapsedSec AS FLOAT)) <> 0;
Execution message:
(0 row(s) affected)
Table 'DSJobNames'. Scan count 5, logical reads 1244, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DsAvg'. Scan count 5, logical reads 2129, physical reads 0, read-ahead reads 24, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'DsJobStat'. Scan count 8, logical reads 84, physical reads 0, read-ahead reads 83, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'AJF'. Scan count 5, logical reads 757999, physical reads 944, read-ahead reads 757311, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
(1 row(s) affected)
SQL Server Execution Times:
CPU time = 21776 ms, elapsed time = 33984 ms.
Execution plan: https://www.brentozar.com/pastetheplan/?id=rJVkLSZ7f
Best Answer
Let's start by considering join order. You have three table references in the query. Which join order might give you the best performance? The query optimizer thinks that the join from
DsJobStat
toDsAvg
will eliminate almost all of the rows (cardinality estimates fall from 212195000 to 1 row). The actual plan shows us that the estimate is pretty close to reality (11 rows survive the join). However, the join is implemented as a right anti semi merge join, so all 212 million rows from theDsJobStat
table are scanned just to produce 11 rows. That could certainly be contributing to the long query execution time, but I can't think of a better physical or logical operator for that join which would have been better. I'm sure that theDJS_Dashboard_2
index is used for other queries, but all of the extra key and included columns will just require more IO for this query and slow you down. So you potentially have a table access problem with the index scan on theDsJobStat
table.I'm going to assume that the join to
AJF
isn't very selective. It currently isn't relevant to the performance issues that you're seeing in the query, so I'm going to ignore it for the rest of this answer. That could change if the data in the table changes.The other problem that's apparent from the plan is the row count spool operator. This is a very lightweight operator but it's executing over 200 million times. The operator is there because the query is written with
NOT IN
. If there is a single NULL row inDsAvg
then all rows must be eliminated. The spool is the implementation of that check. That probably isn't the logic that you want, so you'd be better off with writing that part to useNOT EXISTS
. The actual benefit of that rewrite will depend on your system and data.I mocked up some data based on the query plan to test a few query rewrites. My table definitions are significantly different from yours because it would have been too much effort to mock up data for every single column. Even with the abbreviated data structures I was able to reproduce the performance issue that you're experiencing.
Based on the query plan, we can see that there are around 200000 unique
JobName
values in theDsAvg
table. Based on the actual number of rows after the join to that table we can see that almost all of theJobName
values inDsJobStat
are also in theDsAvg
table. Thus, theDsJobStat
table has 200001 unique values for theJobName
column and 1000 rows per value.I believe that this query represents the performance issue:
All of the other stuff in your query plan (
GROUP BY
,HAVING
, ancient style join, etc) happens after the result set has been reduced to 11 rows. It currently doesn't matter from a query performance point of view, but there could be other concerns there which could be revealed by changed data in your tables.I'm testing in SQL Server 2017, but I get the same basic plan shape as you:
On my machine, that query takes 62219 ms of CPU time and 65576 ms of elapsed time to execute. If I rewrite the query to use
NOT EXISTS
:The spool is no longer executed 212 million times and it probably has the intended behavior from the vendor. Now the query executes in 34516 ms of CPU time and 41132 ms of elapsed time. The majority of the time is spent scanning 212 million rows from the index.
That index scan is very unfortunate for that query. On average we have 1000 rows per unique value of
JobName
, but we know after reading the first row if we'll need the preceding 1000 rows. We almost never need those rows, but we still need to scan them anyway. If we know that the rows aren't very dense in the table and that almost all of them will be eliminated by the join we can imagine a possibly more efficient IO pattern on the index. What if SQL Server read the first row per unique value ofJobName
, checked if that value was inDsAvg
, and simply skipped ahead to the next value ofJobName
if it was? Instead of scanning 212 million rows a seek plan requiring around 200k executions could be done instead.This can mostly be accomplished by using recursion along with a technique that Paul White pioneered that's described here. We can use recursion to do the IO pattern that I described above:
That query is a lot to look at so I recommend carefully examining the actual plan. First we do 200002 index seeks against the index on
DsJobStat
to get all of the uniqueJobName
values. Then we join toDsAvg
and eliminate all rows but one. For the remaining row, join back toDsJobStat
and get all of the required columns.The IO pattern totally changes. Before we got this:
With the recursive query we get this:
On my machine, the new query executes in just 6891 ms of CPU time and 7107 ms of elapsed time. Note that needing to use recursion in this way suggests that something is missing from the data model (or maybe it was just unstated in the posted question). If there is a relatively small table that contains all possible
JobNames
it will be much better to use that table as opposed to recursion on the big table. What it boils down to is if you have a result set containing all of theJobNames
that you need then you can use index seeks to get the rest of the missing columns. However, you can't do that with a result set ofJobNames
that you DON'T need.