Sql-server – When and Where to Start Query Tuning in SQL Server

performancequery-performancesql server

Execution Plan XML for the Query

I am completely new for query tuning in SQL Server. I have gone through many online resources like how to interpret Execution Plan of a query etc to start working on and get familiar to query tuning process, but when I am trying to hands-on on a simple query in my environment, I am getting stuck and confused. I am sharing the query which I am trying to make fast. Currently It is taking around 10 mins to run but not fetching any record.

Also wanted to know how to identify when is the actual need of tuning a query as sometimes it can be complete time waste. Are there parameters to decide on that?

Any suggestions/help regarding the approach/steps I should follow, or how should i look into the query to make it faster is highly appreciated.

Please let me know for any clarification or any info needed. Thanks in advance!

Declare @Date1 datetime          
Declare @Date2 datetime              

select @Date1 =cast(replace(convert(varchar,getdate()-7,102),'.','-') + '00:00:00.000' as datetime)                 
select @Date2 =cast(replace(convert(varchar,getdate()-1,102),'.','-') + ' 23:59:59.000' as datetime)      

select a.ticket_number,a.type,a.parent,a.summary,a.open_date,          
CASE CAST((CONVERT(VARCHAR(10),(DATEDIFF(DAY,GETDATE()-21,a.OPEN_DATE)/7)+1)) AS INT)              
 WHEN  3 THEN 'Current'              
 ELSE 'Previous'              
 END as 'Open in Current / Previous week',          
a.last_mod_date,a.resolve_date,a.close_date,a.ass_group,          
 a.opened_by,a.assignee,a.end_user,a.requestor,a.priority,a.ticket_status,a.urgency,a.category,a.SLA_violation,a.CI,          
 a.others,a.rootcause,e.[Apps / Infra],e.[BU / Tower],e.[L1 / L2 / L3],e.Owner,e.[SDP Group Name],e.Service,e.[Sub BU / Tower],          
(dateadd(ss,(c.time_stamp),'1970-01-01')) as Child_ticket_tag_date,      
'week'+convert(varchar(10),(DATEDIFF(day,GETDATE()-21,DATEADD(ss,((c.time_stamp)),'1970-01-01'))/7)+1) as weeknum,       
c.action_desc           
into #main          
from current_ticket_details (nolock) as a join SDP_Master_apps (nolock) e on a.ass_group=e.[SDP Group Name]          
left join Call_Req (nolock) as b on a.ticket_number = b.ref_num          
left join act_log c (nolock)  on b.persid = c.call_req_id          
where DATEADD(ss, ((c.time_stamp)),'1970-01-01') between @Date1 and @Date2           
and (c.action_desc LIKE '%PARENT%' )         
and a.assignee not in (select email_id from mail_details where frequency ='ALL' and tower = 'APPS' and parm_name = 'exclusion_list')          
and a.type in ('I','R','CO')          
and a.ticket_status not in ('test','cancelled')           
and a.parent is not NULL          
order by c.time_stamp          

Also Sharing the set statistics IO and time output

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 4 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.
Table 'act_log'. Scan count 28882, logical reads 2631931, physical reads 21343, read-ahead reads 78790, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'call_req'. Scan count 0, logical reads 220863, physical reads 14239, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'mail_details'. Scan count 2, logical reads 357920, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'Worktable'. Scan count 0, logical reads 0, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'current_ticket_details'. Scan count 1, logical reads 140156, physical reads 1093, read-ahead reads 94182, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'SDP_master_Apps'. Scan count 1, logical reads 8, physical reads 8, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

(0 row(s) affected)

(1 row(s) affected)

 SQL Server Execution Times:
   CPU time = 19579 ms,  elapsed time = 800395 ms.
SQL Server parse and compile time: 
   CPU time = 0 ms, elapsed time = 0 ms.

 SQL Server Execution Times:
   CPU time = 0 ms,  elapsed time = 0 ms.

I created a non clustered index on current_ticket_details table on columns mentioned in WHERE clause along with including those columns which are present in SELECT Clause. Hence the optimizer considered Index Scan(Non clustered) instead of clustered scan. It reduced the estimated operator cost a bit but execution time is still the same.

create nonclustered index ind_ctd_type_stat_ass_parent_include 
on current_ticket_details(type,ticket_status,assignee,parent)
include(summary,open_date,last_mod_date,resolve_date,close_date,ass_group,          
 opened_by,end_user,requestor,priority,urgency,category,SLA_violation,CI,          
 others,rootcause)

Best Answer

I will walk you through how I approach performance tuning a query like this.

First step is to read through the SQL query to see if there are any performance red flags. Your query joins five tables together and doesn't do anything really complex, so performance is very likely going to be dominated by how the five tables are accessed (index or table scan), the join types (hash, nested, or merge), and the join order. So if this query is running slower than it should then the optimizer is probably getting at least one of those wrong. The most common reason the optimizer gets one of those wrong is if it gets bad cardinality estimates at one or more steps in the plan.

There are a few things in your query that are likely to contribute to bad cardinality estimates. The first is the use of local variables. SQL Server creates the plan without knowing what the value of those variables are and a BETWEEN predicate with two unknown values gives a default cardinality estimate of 9%.

DATEADD(ss, ((c.time_stamp)),'1970-01-01') between @Date1 and @Date2  

You don't want that. You want the SQL Server query optimizer to have as much information as possible and to be aware of the real values of the variables. One way to do that is with an OPTION (RECOMPILE) hint. That can be a good option as long as this query isn't executed so many times per day that the compilation cost becomes an issue.

The DATEADD part of that is likely to contribute to bad cardinality estimates as well. The filter that you have right now is an indirect way of getting what you want. What if instead you converted @Date1 and @Date2 to the same datatype as time_stamp? Calculate the number of seconds since '01/01/1970' and use those as the variable values. That will allow SQL Server to use an index on time_stamp (if one exists) and it will be able to use the statistics on that column to better estimate the number of rows filtered out by that predicate.

The cardinality estimator often gets better results if you filter on what is there instead of trying to filter out values. Is it possible to write this filter as an IN() that contains all of the values that qualify?

and a.ticket_status not in ('test','cancelled') 

Those were the three things that jumped out at me. Now I will look at the plan. First look at the total cost and what SQL Server thinks will be the most expensive operators. The clustered index scan on current_ticket_details has 93% of the cost, but the subtree cost isn't very high. Try running a simple query that returns a scan on that same table. Does it take around 800 seconds? If so you have found the bottleneck and you may need to add an index the current_ticket_details table. If not, the query plan was much more expensive than SQL Server thought it would be which is useful information.

Comparing the actual number of rows versus the estimated number of rows is extremely useful when analyzing many query plans. If there is a large difference at a step then it's possible that SQL Server made a suboptimal choice for operators to the left of that bad estimate. In your query plan, the final estimated row count is 1 row. Do you really expect this query to only insert one row? That is a red flag right there. You can read to the right in the plan to figure out where the estimate went bad. The estimated row count dropped to 1 after the nested loop join to mail_details (node 16). IMO that is the most likely cause of the problem.

SQL Server went for nested loop joins and seeks for the other tables that were joined after it. Not surprisingly some of those tables have a high number of logical reads. If that estimate is incorrect (easy to tell with actual plan) then a good step is to fix that estimate. That may change the rest of the plan and the query could finish faster. In addition, the nested loop joins and seeks have a very low cost. If the number of rows processed by them was much greater than expected then that would explain why the scan on current_ticket_details finished much faster than 800 s (I am assuming that it did).

To state it more clearly, the query optimizer expected to do a single index seek on act_log, but the statistics output tells you that a scan count 28882 and 2631931 logical reads were done against act_log. That doesn't add up.

You haven't provided enough information to figure out exactly why that low estimate is happening for the antijoin to mail_details. As someone else suggested, you should look at the stats on that table to make sure they are up-to-date. If they are up-to-date you could consider a filtered statistic or a computed column. One minor thing worth mentioning is that NOT IN() will return no results if the subquery returns at least one NULL value. If you don't need that behavior then you should rewrite that join as a NOT EXISTS. That will also eliminate the row count spool.

In summary, rewrite these lines to enable the optimizer to get better cardinality estimates:

DATEADD(ss, ((c.time_stamp)),'1970-01-01') between @Date1 and @Date2  
and a.ticket_status not in ('test','cancelled')

and figure out why SQL Server thinks that only one row will remain after this join:

and a.assignee not in (select email_id from mail_details where frequency ='ALL' and tower = 'APPS' and parm_name = 'exclusion_list')

If you address all of that I expect that you'll see a different (and hopefully better) query plan.