Sql-server – T-SQL performance tuning – Query running slower

performancequery-performancesql serversql-server-2012

I've been tasked by the dev manager as a DBA to look into performance tuning the code as attached in the link provided below.

Basically, this is a stored proc that uses a CTE and then joins the CTE with various tables. However, the stored proc execution is surpassing way more than 9 minutes and I need to basically bring the execution time down as much as possible either by rewriting the code or adding indexes.

Looking at the actual execution plan it seems that the most expensive operation seems to be "sort" operation incurring 97% of the overall cost. We have a non clustered index on the sorted column but the sort order is asc and the code does a desc.

I am thinking of changing the sort order in the index to desc but again it uses a row_number() function which requires an order by to be explicit.

Is there a way I can reduce the cost or rewrite the query or maybe add an index or two to reduce the execution time?

;WITH cte_MaxStatus as
(
       SELECT 
              QueuedTxUpdateID, ObjectID, QueuedAt, ErrorMessage,fkSfUpdateStatus, fkSfUpdateAction, ObjectType, ProcessedAt,
              rn = ROW_NUMBER() OVER(PARTITION BY ObjectType, ObjectID ORDER BY QueuedAt DESC)
       FROM dbo.SfQueuedTxUpdates
)



SELECT   MOT.ObjectName,/*mot.Description, */ ObjectID, QueuedAt, ErrorMessage, ua.SfUpdateAction, us.SfUpdateStatus, cte_MaxStatus.ProcessedAt
FROM cte_MaxStatus
       LEFT JOIN dbo.SfUpdateAction UA ON cte_MaxStatus.fkSfUpdateAction = ua.SfUpdateActionID
       LEFT JOIN dbo.SfUpdateStatus US ON cte_MaxStatus.fkSfUpdateStatus = US.SfUpdateStatusID
       LEFT JOIN DBZ.dbo.ModuleObjectTypes MOT ON cte_MaxStatus.ObjectType = MOT.ObjectTypeID
WHERE fkSfUpdateStatus <> 3
              AND cte_MaxStatus.rn = 1

Full SQL Script

SQL PLAN

Best Answer

It's worth pointing out that the estimated cost of the sort (268491.28 optimizer units) probably isn't realistic. The query optimizer thinks that it will need to sort 180 GB of data at that step. However, the sort is performed with a memory grant of 24 GB and it doesn't spill to disk. In fact, the most used memory during the plan is 8 GB. With that said, based on the information that we have the sort is probably the performance bottleneck.

It should be possible to define an index that removes the need for that sort. That's a perfectly valid optimization strategy. If you're looking for an alternative you could consider a parallel apply pattern. The idea is to use parallel nested loops to do a bunch of little sorts instead of one large sort which scales better. I don't know how well it will work with MAXDOP 2 and I can't see your data, but it could be worth a try. Something like this might work:

SELECT MOT.ObjectName,/*mot.Description, */ ObjectID, QueuedAt, ErrorMessage, ua.SfUpdateAction, us.SfUpdateStatus, cte_MaxStatus.ProcessedAt
FROM DBZ.dbo.ModuleObjectTypes MOT
CROSS APPLY
(
    SELECT 
    s.QueuedTxUpdateID, s.ObjectID, s.QueuedAt, s.ErrorMessage, s.fkSfUpdateStatus, s.fkSfUpdateAction, s.ObjectType, s.ProcessedAt,
    rn = ROW_NUMBER() OVER(PARTITION BY s.ObjectID ORDER BY s.QueuedAt DESC)
    FROM dbo.SfQueuedTxUpdates s
    WHERE s.ObjectType = MOT.ObjectTypeID
) cte_MaxStatus
       LEFT JOIN dbo.SfUpdateAction UA ON cte_MaxStatus.fkSfUpdateAction = ua.SfUpdateActionID
       LEFT JOIN dbo.SfUpdateStatus US ON cte_MaxStatus.fkSfUpdateStatus = US.SfUpdateStatusID
WHERE fkSfUpdateStatus <> 3
AND cte_MaxStatus.rn = 1;

In the query plan, what you're looking for is nested loops with index seeks against SfQueuedTxUpdates on the inner side. There should be one sort per ObjectTypeID instead of one large sort. The query above may not be sufficient to get a nested loop join plan. You may need to use a superfluous TOP or other tricks (see the linked video for an explanation of the technique).

Note that I'm making an assumption that doing an INNER JOIN to DBZ.dbo.ModuleObjectTypes won't change the results. If that's not true, you could put distinct ObjectType values from SfQueuedTxUpdates into a temp table. Using recursion to get those distinct values can be more efficient than scanning the entire index. It depends on the ratio of rows in the table to distinct values.

Related Solutions

Sql-server – Performance tuning a cascading delete with many foreign keys

To answer your main question directly, the sorts are there to present rows to update operators (performing deletions in this case) in index key order. The principle at work here is that sorting on the keys will promote sequential access to the index.

This can be a good optimization, though the details depend on your hardware, how likely the affected pages are to be in memory, and whether the sorts can complete within the memory allocated to them. When the optimizer decides the cost of sorting will be paid back by the increased efficiencies associated with sequential index access, it sets a property DMLRequestSort on the update operator:

DML Request Sort

The optimizer may also decide to split the update into separate operators to maintain the clustered index (or heap) and then the nonclustered indexes. often, it will decide to sort more than once - first for the clustered index keys, and then again for the nonclustered index(es). Again, where sorting is considered optimal, each index update operator will have the DMLRequestSort property set to true.

All that said, the things I would fix first would be to eliminate the index scans where the join operator they feed is a nested loops join, and to remove the eager index spools, which are inserting rows into an empty index every time the query is executed. An eager index spool is often the clearest possible sign that you are missing a useful permanent index. The seek predicate in the index spool operator identifies the keys the optimizer would like an index on.

Examples of tables that are missing a nonclustered index (requiring an eager index spool) are:

child6gc8Selections
gc9s
child7s
gc6s

eager index spools

Examples of tables that are currently being scanned below a nested loops join are:

child1
parentObjectMessages
child8s
child7s
child6s
child5s
child4s
child3s
child2s

scan below nested loops

Taking the example shown above, the Clustered Index Scan has an output list of Id, parentObjectId, the Nested Loops Join predicate is child7s.parentObjectId = parentObject.Id, and the join output column list is child7s.Id.

From that information, it seems a good access method (index) on child7s for this part of the query would be keyed on parentObjectId with Id as an included column. You should be able to work out how best to work this into your existing indexing strategy.

The following are examples of tables where the optimizer is currently choosing a hash join. I would check tables like this to ensure that is a reasonable access method:

child6gc8Selections
gc2s
gc5s
gc6Properties

hash joins

The table child2bigChild also participates in a merge join where an explicit sort is necessary. Again, I would check to see if this sort could be avoided.

sort before merge

Once the basic indexing issues are resolved, we can look at other optimizations if necessary.

Sql-server – SHOWPLAN does not display a warning but “Include Execution Plan” does for the same query

This:

SET SHOWPLAN_XML ON;
GO
SELECT * FROM sys.objects;
GO

Is equivalent to pressing Display Estimated Execution Plan on the toolbar (or hitting Ctrl + L). You'll notice that no rows are returned from the query, like there is when you use Include Actual Execution Plan (Ctrl + M).

The spill warning is only a runtime warning. There is no way that SQL Server can know, when displaying the estimated plan, that a spill will happen at runtime. This is because a spill is caused by factors that might only be present during certain invocations of the query (for example, when there is memory pressure). The estimated plan knows roughly how much memory it's going to ask for, but it can't know until execution that it isn't going to get it.

As an aside, may I recommend* our free tool, SQL Sentry Plan Explorer? I think it provides much more obvious information than Management Studio. I recently wrote a lengthy blog post that can act as a tutorial, and Jonathan Kehayias has a great PluralSight course on it as well.

_{* Disclaimer: I work for SQL Sentry.}

Best Answer

Related Solutions

Sql-server – Performance tuning a cascading delete with many foreign keys

Sql-server – SHOWPLAN does not display a warning but “Include Execution Plan” does for the same query

Related Question