Sql-server – Improve reporting stored procedure execution time – tuning temporary tables

performanceperformance-tuningquery-performancesql serversql-server-2008-r2

I've been tasked with improving the performance (and this is my first real-world performance tuning taks) of a reporting stored procedure which is called by an SSRS front-end and the stored procedure currently takes about 30 seconds to run on the largest amount of data (based on filters set from the report frontend).

This stored procedure has a breakdown of 19 queries executing in it, most of which are transforming the data from an initial (legacy) format from inside the base tables into a meaningful dataset to be displayed to the business side.

I've created a query based on a few DMV's in order to find out which are the most resource-consuming queries from the stored procedure (small snippet below) and I have found one query which takes about 10 seconds, in average, to complete.

select
    object_name(st.objectid)                                                                    [Procedure Name]
    , dense_rank() over (partition by st.objectid order by qs.last_elapsed_time desc)           [rank-execution time]
    , dense_rank() over (partition by st.objectid order by qs.last_logical_reads desc)          [rank-logical reads]
    , dense_rank() over (partition by st.objectid order by qs.last_worker_time desc)            [rank-worker (CPU) time]
    , dense_rank() over (partition by st.objectid order by qs.last_logical_writes desc)         [rank-logical write]
        ...
from sys.dm_exec_query_stats as qs
    cross apply sys.dm_exec_sql_text (qs.sql_handle) as st
    cross apply sys.dm_exec_text_query_plan (qs.plan_handle, qs.statement_start_offset, qs.statement_end_offset) as qp
where st.objectid in ( object_id('SuperDooperReportingProcedure') )
    , [rank-execution time]
    , [rank-logical reads]
    , [rank-worker (CPU) time]
    , [rank-logical write] desc

Now, this query is a bit strange in the sense that the execution plan shows that shows that the bulk of the work (~80%) is done when inserting the data into the local temporary table and not when interrogating the other tables from which the source data is taken and then manipulated. (screenshot below is from SQL Sentry Plan Explorer)

Also, in terms of row estimates, the execution plan has way off estimates for this, in the sense that there are only 4218 rows inserted into the local temporary table as opposed to the ~248k rows that the execution plan thinks its moving into the local temporary table. So, becasue of this, I'm thinking "statistics", but still do those even matter if ~80% of the work is the actual insert into the table?

One of my first recommendations was to re-write the entire process and the stored procedure so as to not include the moving and transforming of the data into the reporting stored procedure and to do the data transformation nightly into some persisted tables (real-time data is not required, only relevant data until end of previous day). But the business side does not want to invest time and resources into redesigning this and instead "suggests" I do performance tuning in the sense of finding where and what indexes I can add to speed this up.

I don't believe that adding indexes to base tables will improve the performance of the report since most of the time needed for running the query is saving the data into a temporary table (which from my knowledge it will hit tempdb, which means that they will be written to disk -> increased time due to I/O latency).

But, even so, as I've mentioned this is my first performance tuning task and I've tried to read as much as possible related to this in the last couple of days and these are my conclusions so far, but I'd like to ask for advice from a broader audience and hopefully get a few more insights and understanding on what I can do to improve this procedure.

As a few clear questions I'd appreciate if could be answered are:

Is there anything incorrect in what I have said above (in my understanding of the db or my assumptions) ?
Is it true that adding an index to a temporary table will actually increase the time of execution, since the table (and its associated index(es) is/are being rebuilt on each execution)?
Could there anything else be done in this scenario without having to re-write the procedure / queries and only be done via indexes or other tuning methods? (I've read a few article headlines that you could also "tune tempdb", but I didn't get into the details of those, yet).

Any help is very much appreciated and if you need more details I'll be happy to post.

Additional details:

The query in question is (partially) below. What is missing are a few more aggregate columns and their corresponding lines in the GROUP BY section:

select
    b.ProgramName
    ,b.Region
    ,case when b.AM IS null and b.ProgramName IS not null 
        then 'Unassigned' 
        else b.AM 
    end as AM
    ,rtrim(ltrim(b.Store)) Store
    ,trd.Store_ID
    ,b.appliesToPeriod
    ,isnull(trd.countLeadActual,0) as Actual
    ,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date then b.budgetValue else 0 end),0) as Budget
    ,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget
    ... 
into #SalvesVsBudgets
from #StoresBudgets b
    left join #temp_report_data trd on trd.store_ID = b.StoreID and trd.newSourceID = b.ProgramID
where (b.StoreDivision is not null or (b.StoreDivision is null and b.ProgramName = 'NewProgram'))
    group by
        b.ProgramName
        ,b.Region
        ,case when b.AM IS null and b.ProgramName IS not null 
            then 'Unassigned' 
            else b.AM 
        end
    ,rtrim(ltrim(b.Store))
    ,trd.Store_ID
    ,b.appliesToPeriod
    ,isnull(trd.countLeadActual,0)

I'm not sure if this is actually helpful, but I added the information, just in case:

the temporary tables have no indexes on them
RAM size: 32 GB

I have tried to move the CASE statements from the aggregate-generating query and unfortunately, overall, the procedure time has not improved, noticeably, as it still fluctuates in the range of ±0.25 to ±1.0 second (yes, both lower and higher time than the original version of the stored procedure – but I'm guessing this is due to variable workload on my machine).

The execution plan for the same query, but modified to remove the CASE conditions, leaving only the SUM aggregates, is now:

Best Answer

You might be able to do a union instead of the or. That could prevent a table scan.

select
    b.ProgramName
    ,b.Region
    ,case when b.AM IS null and b.ProgramName IS not null 
        then 'Unassigned' 
        else b.AM 
    end as AM
    ,rtrim(ltrim(b.Store)) Store
    ,trd.Store_ID
    ,b.appliesToPeriod
    ,isnull(trd.countLeadActual,0) as Actual
    ,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date then b.budgetValue else 0 end),0) as Budget
    ,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget
    ... 
into #SalvesVsBudgets
from #StoresBudgets b
    left join #temp_report_data trd on trd.store_ID = b.StoreID and trd.newSourceID = b.ProgramID
where (b.StoreDivision is not null)
    group by
        b.ProgramName
        ,b.Region
        ,case when b.AM IS null and b.ProgramName IS not null 
            then 'Unassigned' 
            else b.AM 
        end
    ,rtrim(ltrim(b.Store))
    ,trd.Store_ID
    ,b.appliesToPeriod
    ,isnull(trd.countLeadActual,0)

Union

select
    b.ProgramName
    ,b.Region
    ,case when b.AM IS null and b.ProgramName IS not null 
        then 'Unassigned' 
        else b.AM 
    end as AM
    ,rtrim(ltrim(b.Store)) Store
    ,trd.Store_ID
    ,b.appliesToPeriod
    ,isnull(trd.countLeadActual,0) as Actual
    ,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date then b.budgetValue else 0 end),0) as Budget
    ,isnull(sum(case when b.budgetType = 0 and b.budgetMonth between @start_date and @end_date and (trd.considerMe = -1 or b.StoreID < 0) then b.budgetValue else 0 end),0) as CleanBudget
    ... 
into #SalvesVsBudgets
from #StoresBudgets b
    left join #temp_report_data trd on trd.store_ID = b.StoreID and trd.newSourceID = b.ProgramID
where  (b.StoreDivision is null and b.ProgramName = 'NewProgram')
    group by
        b.ProgramName
        ,b.Region
        ,case when b.AM IS null and b.ProgramName IS not null 
            then 'Unassigned' 
            else b.AM 
        end
    ,rtrim(ltrim(b.Store))
    ,trd.Store_ID
    ,b.appliesToPeriod
    ,isnull(trd.countLeadActual,0);

Additionally you might be getting forced into a single thread execution plan by the trim functions, if you are able to do the ltrim and rtrim in the application that is consuming the data instead of in the query you might be able to get an execution plan that goes parallel.

Related Solutions

SQL Server – Different Execution Plans for INSERT at Table Variable and Temporary Table

The first thing that I noticed is that the query plan compilation time was over 3 seconds for each query. Wow, this is a really complex query!

Because the solution space of potential execution plans is so large (it grows exponentially with the number of the number of objects involved in the query), SQL Server is only going to be able to explore a tiny fraction of the potential query plans when coming up with a plan for these queries. Remember that SQL Server's job isn't to create the best query plan possible, but instead to create a query plan that is good enough and to do so as quickly as possible.

I have often found that small changes in the way a query is formulated, even if they don't impact the logic of the query, can have a significant impact on the query plan. Anecdotally, this grows more and more likely as the query grows more and more complex. One possible reason that this could happen is that a tweak to the query might cause SQL Server to begin cost-based optimization with a different initial plan. As cost-based optimization proceeds, this different starting point could yield a different exploration of the space of potential query plans--kind of like a different random seed impacts random number generation. Note that the query plans you provided are significantly different (compare images of the plan shape below!) and SQL Server actually does estimate that the @table variable plan is slightly cheaper.

In terms of why the table variable vs. temp table would have such an impact on cost-based optimization, I'll hazard an only-partially-educated guess: inserting into a table variable forces a serial plan (see the NonParallelPlanReason of CouldNotGenerateValidParallelPlan that appears in the table variable plan, but not the temp table plan), and this may impact the code path that the query optimizer takes either generating an initial plan or in some phase of plan optimization.

If possible, the next step that I would try is to simplify the query so that fewer tables are used and/or the query is split into multiple queries (with intermediate #temp tables) so that each query is simpler and has better statistics available. If that's not possible, you could also try more hacky options such as using query hints (e.g., force MAXDOP 1 on the temp table query, and see if the plan comes out more like the table variable query).

Query plan with #temp table: enter image description here

Query plan with @table variable: enter image description here

And finally, if you are interested in going a little bit deeper into how the query optimizer works, I have found Paul White's blog to be a great resource!

Sql-server – deteriorating stored procedure running times

What is up with FROM part JOIN model ON 1=1? This the same as FROM part, model, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?

You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.

This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.

You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using sp_executesql, as in the following example:

CREATE PROCEDURE [dbo].[create_grid_materials2] 
(
    @partlistid bigint
    , @pid bigint
    , @masterid bigint
)
AS
BEGIN
    begin
        DECLARE @cmd NVARCHAR(MAX);

        SET @cmd = '   
        INSERT INTO material (partid, personid, modelID)
        SELECT 
            partid = part.id
            , personid = @pid
            , modelid = model.id  
        FROM part
            INNER JOIN model ON 1=1
        WHERE (
            model.masterid = ' + CONVERT(NVARCHAR(50), @masterid) + ' 
                AND model.modelSetID IS NULL
                AND part.partlistid = ' + CONVERT(NVARCHAR(50), @partlistid) + '
                AND (
                    part.partType = 100 
                    or part.partType=120 
                    or part.partType = 130
                )
            )
            AND NOT EXISTS (
                SELECT 1 
                FROM material AS a1 
                WHERE a1.partid = part.id 
                    AND a1.personid=@pid 
                    AND a1.modelid=model.id
                )';
        DECLARE @Params VARCHAR(200);
        SET @Params = '@pid INT';
        EXEC sys.sp_executesql @cmd
            , @Params
            , @pid = @pid;
    end
End

The above code will cause a new plan to be generated for each combination of @partlistid, and @masterid.

The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.

Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included @pid since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.

Best Answer

Related Solutions

SQL Server – Different Execution Plans for INSERT at Table Variable and Temporary Table

Sql-server – deteriorating stored procedure running times

Related Question