Sql-server – How to improve this query plan

performancequery-performancesql serversql-server-2008-r2

I have inherited a system and have been looking into a SQL Agent job that was running in around 15-20 mins that is now failing to complete. The SQL Server was moved from a Physical to Virtual environment. On the virtual environment the job was running about the same time as on the Physical environment. That was until a few days ago when it wasn't completing. Looking at the job one of the steps is executing a stored proc. Within that Proc it's then Inserting everything from a view into a temp table as well as other tables from the other database on the server.

Looking at the view is making me ill, as it looks like it's not fit for purpose. I did a SELECT TOP 10 * FROM the view and it took 54 mins to run.

I noticed that the TempDB Drive was flatlining when we ran this query and that the latency was up in the hundreds for Read/Write.

I noticed that the estimate and actual rows was way out and updated the stats on all tables in the view. Funny enough I have noticed now the latency has dropped down since doing this. I have ran Crystal Disk Mark on the old physical (Left) and this new virtual server (right).

enter image description here

There seems to be a significant difference in the Sequential Read/Writes and differences in the Random Read/Writes between the H:\ Drive on the old Physical and the T:\ Drive on the new virtual. The Random Reads seem to be performing better on the new server but the writes are slower. I’ve highlighted these drives as this is where the TempDB is situated on both servers.

I'm in a fight with the SAN guys at the moment, as they are saying everything is fine at their end, even though I've graphed the perfmon files and can show that disk latency went through the roof last week on a specific day and not sure if it's coincidence or not but the jobs that are not completing started to fail after this spike in the SAN. Disk Latency is averaging now about 80-100 ms, so way over the 20ms acceptable level.

I've uploaded the plan (just the view).

Does anyone have any tips on how to tune this query please? It makes me feel faint just looking at this and knowing it's probably in other stored procs across the estate. It's selecting 100 million rows to bring back 10.

I have updated the stats and this seems to have dropped the latency on the disks, though we also enabled Optimize for Ad hoc workloads.

enter image description here

View definition

--Removed comments to protect the guilty
CREATE  VIEW [dbo].[vw_PPM_CompletionSummary_Resource_ID_Added]  
  
AS       
  
  
SELECT   
    SD.fld_str_Name AS 'Division',  
    DM.fld_str_Name AS 'DivManager',  
    SC.fld_str_Name AS 'Cluster',  
    OM.fld_str_Name AS 'OpsManager',  
    S.fld_int_ID AS 'StoreID',  
    S.fld_str_Name AS 'Store',  
    ST.fld_str_Description AS 'StoreType',  
    PT.fld_str_Name AS 'PPMType',  
    PT.fld_int_ID AS 'PPMTypeID',  
    MT.fld_str_Name AS 'Discipline',  
    MT.fld_int_ID AS 'DisciplineID',  
    SL.fld_int_ID AS 'PPMRef',  
    SL.fld_dat_WeekEnding AS 'WeekEnding',  
    WN.WeekNumber AS 'SchedWeekNo',  
    YEAR(SL.fld_dat_WeekEnding) AS 'SchedYear',  
    1 AS 'PPMDue',  
    -- case when SL.fld_int_StatusID = 2 then 1 else 0 end as 'PPMComplete',  
    CASE WHEN SL.fld_int_StatusID = 2 THEN 1 WHEN ProjComp.ProjComp IS NOT NULL THEN 1 ELSE 0 END AS 'PPMComplete',  
    CASE WHEN PT.fld_int_ID IN (4,5,7,6,99,97,407,408) THEN 1 ELSE 0 END  AS 'IncludeInKPI', --Bakery,Catering,HVAC Service,Refrigeration,PPM-Call Point Testing, PPM-Sprinkler Valve Weekly Test, Catering Minor, Bakery Minor  
    CASE WHEN PT.fld_int_ResourceTypeID IN (1,2,3) THEN 1 ELSE 0 END AS 'CityConfiguredPPMType', --IST,Engineer,Alliance  
    CASE WHEN PT.fld_int_ResourceID IN (SELECT fld_int_ID FROM Prod.dbo.tblHelpDeskResource (NOLOCK) WHERE fld_str_Name LIKE 'jungh%')  
    THEN 1 ELSE 0 END AS 'JungheinrichConfiguredPPM',  
    RD.CalloutEngineerID,  
    RD.CalloutEngineer AS 'CalloutEngineer',  
    RD.CalloutDate AS 'CalloutDate',  
    RD.CompletedBy,  
    CASE WHEN RD.CompleteDate IS NOT NULL THEN   
    CASE WHEN ProjComp.ProjComp < RD.CompleteDate THEN   
    ProjComp.ProjComp   
    ELSE   
    RD.CompleteDate END   
    ELSE   
    ProjComp.ProjComp   
    END AS 'CompleteDate',  
    CASE WHEN RD.CompleteDate IS NOT NULL THEN   
    CASE WHEN ProjComp.ProjComp < RD.CompleteDate THEN   
    DATEPART(wk, ProjComp.ProjComp)   
    ELSE   
    RD.CompletedWeek   
    END  
    ELSE    
    DATEPART(wk, ProjComp.ProjComp)   
    END AS 'CompletedWeek',  
    PT.fld_int_Frequency AS 'PPMFrequency',  
    CalloutEngineerType,  
    CalloutEngineerSubType  
  
FROM    
    Prod.dbo.tblPPMScheduleLine SL (NOLOCK) JOIN  
    Prod.dbo.tblProjectHeader PH (NOLOCK) ON PH.fld_int_PPMScheduleRef = SL.fld_int_ID JOIN   
    dbo.tblPPMReportWeekNumbers WN (NOLOCK) ON SL.fld_dat_WeekEnding = WN.WeekEnding JOIN  
    Prod.dbo.tblStore S (NOLOCK) ON SL.fld_int_StoreID = S.fld_int_ID JOIN  
    Prod.dbo.tblStoreType ST (NOLOCK) ON S.fld_int_StoreTypeID = ST.fld_int_ID JOIN  
    Prod.dbo.tblPPMType PT (NOLOCK) ON SL.fld_int_PPMTypeID = PT.fld_int_ID JOIN  
    Prod.dbo.vw_tblStoreCluster_MI SC (NOLOCK) ON S.fld_int_StoreClusterID = SC.fld_int_ID JOIN  
    Prod.dbo.tblStoreDivision SD (NOLOCK) ON SC.fld_int_StoreDivisionID = SD.fld_int_ID LEFT JOIN  
    Prod.dbo.tblHelpDeskResource OM (NOLOCK) ON SC.fld_int_AreaManagerID = OM.fld_int_ID LEFT JOIN  
    Prod.dbo.tblHelpDeskResource DM (NOLOCK) ON SD.fld_int_DivisionalManagerID = DM.fld_int_ID JOIN  
    Prod.dbo.tblAssetMainType MT (NOLOCK) ON PT.fld_int_MainTypeID = MT.fld_int_ID LEFT JOIN  
    (  
    SELECT DISTINCT   
    SL.fld_int_ID AS 'PPMRef',   
    ISNULL(lastcompletedcallout.CalloutEngineerID,callout.CalloutEngineerID) AS 'CalloutEngineerID',  
    ISNULL(lastcompletedcallout.CalloutEngineer,callout.CalloutEngineer) AS 'CalloutEngineer',  
    ISNULL(lastcompletedcallout.CalloutDate,callout.CalloutDate) AS 'CalloutDate',  
    ISNULL(lastcompletedcallout.CompletedBy,callout.CompletedBy) AS 'CompletedBy',  
    ISNULL(lastcompletedcallout.CompleteDate,callout.CompleteDate) AS 'CompleteDate',  
    ISNULL(lastcompletedcallout.CompletedWeek,callout.CompletedWeek) AS 'CompletedWeek',  
    ISNULL(lastcompletedcallout.CalloutEngineerType,callout.CalloutEngineerType) AS 'CalloutEngineerType',  
    ISNULL(lastcompletedcallout.CalloutEngineerSubType,callout.CalloutEngineerSubType) AS 'CalloutEngineerSubType'  
    FROM   
    Prod.dbo.tblPPMScheduleLine SL (NOLOCK) JOIN  
    Prod.dbo.tblProjectHeader PH (NOLOCK) ON SL.fld_int_ID = PH.fld_int_PPMScheduleRef LEFT JOIN  
    (  
    SELECT   
    PH.fld_int_ID AS 'ProjectHeaderID',  
    R.fld_int_ID AS 'CalloutEngineerID',  
    R.fld_str_Name AS 'CalloutEngineer',  
    RT.fld_str_Name AS 'CalloutEngineerType',  
    RST.fld_str_Name AS 'CalloutEngineerSubType',  
    CE.fld_dat_CreatedDate AS 'CalloutDate',  
  
  
    --amended to only show data when it is completed event status SW 2017-02-21 -----------------------------------  
    CASE WHEN ce.fld_int_StatusID = 3 THEN CE.fld_str_LastEditedBy END AS 'CompletedBy',  
    CASE WHEN ce.fld_int_StatusID = 3 THEN CE.fld_dat_LastEdited END AS  'CompleteDate',  
    CASE WHEN ce.fld_int_StatusID = 3 THEN  DATEPART(wk,CE.fld_dat_LastEdited) END AS 'CompletedWeek'  
  
    FROM    
    (  
        SELECT   
        CP.fld_int_ProjectHeaderID,  
        MIN(CE.fld_int_ID) AS 'FirstCallout'  
        FROM   
        Prod.dbo.tblChildProject CP (NOLOCK) JOIN  
        Prod.dbo.tblPPMResourceCalloutEvent CE (NOLOCK) ON CP.fld_int_ID = CE.fld_int_ChildProjectID  
        WHERE   
        fld_int_StatusID IN (1,3) --called out, complete  
        GROUP BY   
        CP.fld_int_ProjectHeaderID  
  
    )FC JOIN  
    Prod.dbo.tblPPMResourceCalloutEvent CE (NOLOCK) ON FC.FirstCallout = CE.fld_int_ID  JOIN   
    Prod.dbo.tblChildProject CP (NOLOCK) ON CE.fld_int_ChildProjectID = CP.fld_int_ID JOIN  
    Prod.dbo.tblProjectHeader PH (NOLOCK) ON CP.fld_int_ProjectHeaderID = PH.fld_int_ID LEFT JOIN  
    Prod.dbo.tblHelpDeskResource R (NOLOCK) ON CE.fld_int_ResourceID = R.fld_int_ID LEFT JOIN   
    Prod.dbo.tblHelpDeskResourceType RT (NOLOCK) ON R.fld_int_HelpDeskResourceTypeID=RT.fld_int_ID LEFT JOIN   
    Prod.dbo.tblHelpDeskResourceType RST (NOLOCK) ON R.fld_int_HelpdeskResourceSubTypeID=RST.fld_int_ID  
       
   ) callout ON PH.fld_int_ID = callout.ProjectHeaderID LEFT JOIN --first callout for any not completed  
   (  
    SELECT   
        PH.fld_int_ID AS 'ProjectHeaderID',  
        R.fld_int_ID AS 'CalloutEngineerID',  
        R.fld_str_Name AS 'CalloutEngineer',  
        RT.fld_str_Name AS 'CalloutEngineerType',  
        RST.fld_str_Name AS 'CalloutEngineerSubType',  
        CE.fld_dat_CreatedDate AS 'CalloutDate',  
        CE.fld_str_LastEditedBy AS 'CompletedBy',  
        CE.fld_dat_LastEdited AS 'CompleteDate',  
        DATEPART(wk,CE.fld_dat_LastEdited) AS 'CompletedWeek'  
  
    FROM   
        (  
      SELECT   
       a.fld_int_ProjectHeaderID,   
       a.LastEdited,   
       MAX(b.CalloutEventID) AS 'LastCompletedCallout' --if there are potentially > 1 completed callout events with the same Last Edited Date, this picks the max one.  
      FROM   
       (  
        SELECT   
         CP.fld_int_ProjectHeaderID,  
         MAX(ce.fld_dat_LastEdited) AS LastEdited --datetime of the last edited completed callout event  
        FROM   
         Prod.dbo.tblChildProject CP (NOLOCK) JOIN  
         Prod.dbo.tblPPMResourceCalloutEvent CE (NOLOCK) ON CP.fld_int_ID = CE.fld_int_ChildProjectID  
        WHERE   
         fld_int_StatusID = 3  AND -- complete  
         ce.fld_dat_DateCancelled IS NULL -- can be complete and cancelled at once...  
        GROUP BY   
         CP.fld_int_ProjectHeaderID  
  
       ) a LEFT JOIN   
       (   
        SELECT   
         CP.fld_int_ProjectHeaderID,  
         ce.fld_int_ID AS  CalloutEventID,  
         ce.fld_dat_LastEdited AS OtherLastEdited  
  
        FROM   
         Prod.dbo.tblChildProject CP (NOLOCK) JOIN  
         Prod.dbo.tblPPMResourceCalloutEvent CE (NOLOCK) ON CP.fld_int_ID = CE.fld_int_ChildProjectID  
        WHERE   
         fld_int_StatusID = 3  AND -- complete  
         ce.fld_dat_DateCancelled IS NULL -- can be complete and cancelled at once...  
  
       ) b ON a.fld_int_ProjectHeaderID = b.fld_int_ProjectHeaderID AND   
        a.LastEdited = b.OtherLastEdited  
      GROUP BY   
       a.fld_int_ProjectHeaderID,   
       a.LastEdited  
  
  
        --SELECT   
        --CP.fld_int_ProjectHeaderID,  
        --MAX(CE.fld_dat_LastEdited) AS 'LastCompletedCalloutDate'  
        --FROM   
        --Prod.dbo.tblChildProject CP (NOLOCK) JOIN  
        --Prod.dbo.tblPPMResourceCalloutEvent CE (NOLOCK) ON CP.fld_int_ID = CE.fld_int_ChildProjectID  
        --WHERE   
        --fld_int_StatusID = 3 -- complete  
        --GROUP BY   
        --CP.fld_int_ProjectHeaderID  
  
     )FC JOIN  
     Prod.dbo.tblPPMResourceCalloutEvent CE (NOLOCK)  ON FC.LastCompletedCallout = CE.fld_int_ID  JOIN   
     Prod.dbo.tblChildProject CP (NOLOCK) ON CE.fld_int_ChildProjectID = CP.fld_int_ID JOIN  
     Prod.dbo.tblProjectHeader PH (NOLOCK) ON CP.fld_int_ProjectHeaderID = PH.fld_int_ID LEFT JOIN  
     Prod.dbo.tblHelpDeskResource R (NOLOCK) ON CE.fld_int_ResourceID = R.fld_int_ID LEFT JOIN   
     Prod.dbo.tblHelpDeskResourceType RT (NOLOCK) ON R.fld_int_HelpDeskResourceTypeID=RT.fld_int_ID LEFT JOIN   
     Prod.dbo.tblHelpDeskResourceType RST (NOLOCK) ON R.fld_int_HelpdeskResourceSubTypeID=RST.fld_int_ID  
       
    )lastcompletedcallout ON PH.fld_int_ID = lastcompletedcallout.ProjectHeaderID  
    WHERE    
    SL.fld_int_StatusID <> 3 AND --ppm not cancelled   
    callout.CalloutEngineer IS NOT NULL  
  
    ) RD ON SL.fld_int_ID = RD.PPMRef LEFT JOIN  
    (  
    SELECT   
    ph.fld_int_PPMScheduleRef AS 'PPM_Ref',   
    MIN(ph.fld_dat_CompleteDate) AS 'ProjComp'  
    FROM   
    Prod.dbo.tblProjectHeader ph (NOLOCK)  
    WHERE   
    ph.fld_dat_CompleteDate IS NOT NULL  
    GROUP BY   
    ph.fld_int_PPMScheduleRef  
  
    ) ProjComp ON SL.fld_int_ID = ProjComp.PPM_Ref  
WHERE   
    SL.fld_int_StatusID <> 3 AND --ppm not cancelled  
  fld_int_ProjectStatusID NOT IN (6, 7) AND --project header not cancelled too  
    SL.fld_dat_WeekEnding <= DATEADD(dd, DATEDIFF(dd, 0, GETDATE()), 0) --less than today  
         

Best Answer

One of the most important things is to always test a query that's identical (or as similar to as possible) to the one with the performance issue. Sometimes people try to test a simple SELECT query instead of they add a TOP to the query thinking that if the query with the TOP can't perform well then the full query certainly can't perform well. That isn't always true. Sometimes adding a TOP can make performance worse, as explained here.

Another important thing is to give yourself the tools which give you the best chance for success when tuning a query. The query plan that you're looking at isn't a simple one and I can understand why you might be intimidated by it. You should strongly consider upgrading your version of SSMS (if applicable) and downloading the free Sentry One Plan Explorer. That'll give you more information and make it easier to performance tune queries like the one that you have here.

Here's the first thing that I notice using Plan Explorer:

enter image description here

It looks like you have mismatched data types for SL.fld_dat_WeekEnding and WN.WeekEnding. That can cause issues with cardinality estimation. The query optimizer estimates that your "less than today" filter will exclude nearly all of the rows, but of course that isn't true. The row estimate is 1 but the actual number of rows is in the million. That type of disparity can cause performance issues because the query optimizer is working under bad information. It might choose a different, more efficient plan if it was working with better estimates.

In review:

  1. Test with the real query whenever possible.
  2. Use the best tools available to make your job easier.
  3. Identify and fix cardinality estimates and see how the plan changes. Fixing one issue might reveal another, so be prepared for an iterative process.