Sql-server – T-SQL performance tuning – Query running slower

performancequery-performancesql serversql-server-2012

I've been tasked by the dev manager as a DBA to look into performance tuning the code as attached in the link provided below.

Basically, this is a stored proc that uses a CTE and then joins the CTE with various tables. However, the stored proc execution is surpassing way more than 9 minutes and I need to basically bring the execution time down as much as possible either by rewriting the code or adding indexes.

Looking at the actual execution plan it seems that the most expensive operation seems to be "sort" operation incurring 97% of the overall cost. We have a non clustered index on the sorted column but the sort order is asc and the code does a desc.

I am thinking of changing the sort order in the index to desc but again it uses a row_number() function which requires an order by to be explicit.

Is there a way I can reduce the cost or rewrite the query or maybe add an index or two to reduce the execution time?

;WITH cte_MaxStatus as
(
       SELECT 
              QueuedTxUpdateID, ObjectID, QueuedAt, ErrorMessage,fkSfUpdateStatus, fkSfUpdateAction, ObjectType, ProcessedAt,
              rn = ROW_NUMBER() OVER(PARTITION BY ObjectType, ObjectID ORDER BY QueuedAt DESC)
       FROM dbo.SfQueuedTxUpdates
)



SELECT   MOT.ObjectName,/*mot.Description, */ ObjectID, QueuedAt, ErrorMessage, ua.SfUpdateAction, us.SfUpdateStatus, cte_MaxStatus.ProcessedAt
FROM cte_MaxStatus
       LEFT JOIN dbo.SfUpdateAction UA ON cte_MaxStatus.fkSfUpdateAction = ua.SfUpdateActionID
       LEFT JOIN dbo.SfUpdateStatus US ON cte_MaxStatus.fkSfUpdateStatus = US.SfUpdateStatusID
       LEFT JOIN DBZ.dbo.ModuleObjectTypes MOT ON cte_MaxStatus.ObjectType = MOT.ObjectTypeID
WHERE fkSfUpdateStatus <> 3
              AND cte_MaxStatus.rn = 1

Full SQL Script

SQL PLAN

Best Answer

It's worth pointing out that the estimated cost of the sort (268491.28 optimizer units) probably isn't realistic. The query optimizer thinks that it will need to sort 180 GB of data at that step. However, the sort is performed with a memory grant of 24 GB and it doesn't spill to disk. In fact, the most used memory during the plan is 8 GB. With that said, based on the information that we have the sort is probably the performance bottleneck.

It should be possible to define an index that removes the need for that sort. That's a perfectly valid optimization strategy. If you're looking for an alternative you could consider a parallel apply pattern. The idea is to use parallel nested loops to do a bunch of little sorts instead of one large sort which scales better. I don't know how well it will work with MAXDOP 2 and I can't see your data, but it could be worth a try. Something like this might work:

SELECT MOT.ObjectName,/*mot.Description, */ ObjectID, QueuedAt, ErrorMessage, ua.SfUpdateAction, us.SfUpdateStatus, cte_MaxStatus.ProcessedAt
FROM DBZ.dbo.ModuleObjectTypes MOT
CROSS APPLY
(
    SELECT 
    s.QueuedTxUpdateID, s.ObjectID, s.QueuedAt, s.ErrorMessage, s.fkSfUpdateStatus, s.fkSfUpdateAction, s.ObjectType, s.ProcessedAt,
    rn = ROW_NUMBER() OVER(PARTITION BY s.ObjectID ORDER BY s.QueuedAt DESC)
    FROM dbo.SfQueuedTxUpdates s
    WHERE s.ObjectType = MOT.ObjectTypeID
) cte_MaxStatus
       LEFT JOIN dbo.SfUpdateAction UA ON cte_MaxStatus.fkSfUpdateAction = ua.SfUpdateActionID
       LEFT JOIN dbo.SfUpdateStatus US ON cte_MaxStatus.fkSfUpdateStatus = US.SfUpdateStatusID
WHERE fkSfUpdateStatus <> 3
AND cte_MaxStatus.rn = 1;

In the query plan, what you're looking for is nested loops with index seeks against SfQueuedTxUpdates on the inner side. There should be one sort per ObjectTypeID instead of one large sort. The query above may not be sufficient to get a nested loop join plan. You may need to use a superfluous TOP or other tricks (see the linked video for an explanation of the technique).

Note that I'm making an assumption that doing an INNER JOIN to DBZ.dbo.ModuleObjectTypes won't change the results. If that's not true, you could put distinct ObjectType values from SfQueuedTxUpdates into a temp table. Using recursion to get those distinct values can be more efficient than scanning the entire index. It depends on the ratio of rows in the table to distinct values.