Sql-server – Inconsistent Execution Plan for Stored Procedure

execution-plansql serversql server 2014stored-procedures

We have a business-critical stored procedure that normally runs daily at 2am from a scheduled job (in an SSIS package) on the production DB server. The same procedure/package is called from a second job, 15 minutes later, from a SQL Agent job running on a different server (as an emergency failover in case anything goes awry with the first job).

The procedure is defined WITH RECOMPILE.

The procedure normally executes in about 45 seconds. Last Wednesday, and again this morning (also Wednesday, coincidence?!?), the 2am procedure took 90 minutes to execute. While it was executing, the 2:15am job ran and that execution took the usual 45 seconds.

I have the execution plans from both situations. There are some table variable processes that should include estimated row counts in the neighborhood of 200K rows. The faulty plan reports these table variables with an estimated 130 billion rows. [Side note: I have already rewritten the code to use temp tables instead of table variables, based on this discussion and will be moving it to production in the near future]

Our monitoring software (Solar Winds DPA) reports excessive CXPACKET waits for the 2am execution. This seems to indicate issues with parallelism and is likely related to the table variables being used in the procedure.

There is still user activity on the server during this time, and some scheduled jobs, but nothing that I see that would affect this procedure or its execution plan. An index maintenance job is run at 2:30am.

I understand the poorly-performing execution plan is related to the temp tables, but why would this same procedure executed 15 minutes later have such a drastically different execution plan (and why does the 2am execution run fine the rest of the week?)

Here are links to the .sqlplan files: The Bad Plan and the Good Plan.

Best Answer

For the 'good' plan, all the table variable cardinality estimates are 1 row. This is the most common outcome when using table variables, unless trace flag 2453 is enabled, or a statement-level recompilation occurs (for example because OPTION (RECOMPILE) is used, or one of the regular tables in the query has passed its recompilation threshold.

For the 'bad' plan, table variable cardinalities are accurate, implying one of the conditions mentioned above was in play. This may seem counter-intuitive, since better information usually leads to better plans, but table variables do not support statistics, so the extra information is rather limited. The optimizer knows there are 'x' rows, but has no idea about the distribution of values within those rows. A different kind of incomplete information, perhaps, but still.

Anyway, it just so happens that the plan built when the table variables are assumed to contain one row happens to produce good performance. There is more than a little luck involved in this. Unless you enjoy debugging rare plan regressions, I would avoid relying on luck too much.

Specifics

The faulty plan reports these table variables with an estimated 130 billion rows.

The part of the plan you are referring to is:

As you can see, it is the Table Spool that is estimated to produce ~130 billion rows; the table variable emits only 198,411.

The sort and spool combination is designed to optimize repeated scans, by caching the result from one iteration of the nested loop join and replaying the saved result on the next iteration if the correlated parameter(s) have not changed. The sort ensures any potential duplicates arrive together, since the spool only caches the most recent result. The estimate from the spool is the total number of rows (198,411 from the table variable * 653,969 iterations).

The useful predicate relating the rows from the sort with the table variable is stuck on the nested loops left outer join iterator:

Looking at this in conjunction with the output columns from the table variable, we can conclude that an index on the table variable on PatientID, FirstTestDate would almost certainly eliminate this problem.

An analysis of sub_PSTRules could remove the index and table spools seen there, though these are not having much of an effect on performance at this stage:

Nevertheless, it is wasteful to have SQL Server build a temporary nonclustered index each time, then throw it away at the end. The missing (filtered) index is likely:

CREATE INDEX give_me_a_good_name
ON dbo.sub_PSTRules
    (SubscriberSID, CinicSID, OfficeSID)
INCLUDE
    (PSTQuestionGroupSID)
WHERE
    OfficeSID IS NULL;

Related Solutions

Sql-server – Join To a @Table Variable is running ineficiently

[Copying from my answer on SQLPerformance.com.]

Some very brief initial suggestions from discussions elsewhere:

Try creating @xmlTemp as a #temp table with a clustered index on (StartDate, EndDate) instead of a table variable. This may provide SQL Server with more accurate stats information (though questionably useful if the table only has one row).
If @xmlTemp has only one row always, use two variables instead of a table in the first place.
Try adding the (RECOMPILE) option to the statement, especially if you convert to variables instead of the #temp table (parameter sniffing).
Try using OPTION (MAXDOP 1) - parallelism is definitely in use, and at the lower end the threads seem partially imbalanced. I wonder if parallelism is helping or hurting here - can't hurt to test duration with and without.
You may need to perform more rigorous stats updates. A lot of these estimates are way, way off.
Remove the DISTINCTs. For this set of columns I find it hard to believe this is eliminating any duplicates, but the optimizer has to work as if there are dupes to remove.
Consider using Table-Valued Parameters (TVPs) instead of shredding XML for the different companies / stores.

Sql-server – deteriorating stored procedure running times

What is up with FROM part JOIN model ON 1=1? This the same as FROM part, model, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?

You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.

This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.

You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using sp_executesql, as in the following example:

CREATE PROCEDURE [dbo].[create_grid_materials2] 
(
    @partlistid bigint
    , @pid bigint
    , @masterid bigint
)
AS
BEGIN
    begin
        DECLARE @cmd NVARCHAR(MAX);

        SET @cmd = '   
        INSERT INTO material (partid, personid, modelID)
        SELECT 
            partid = part.id
            , personid = @pid
            , modelid = model.id  
        FROM part
            INNER JOIN model ON 1=1
        WHERE (
            model.masterid = ' + CONVERT(NVARCHAR(50), @masterid) + ' 
                AND model.modelSetID IS NULL
                AND part.partlistid = ' + CONVERT(NVARCHAR(50), @partlistid) + '
                AND (
                    part.partType = 100 
                    or part.partType=120 
                    or part.partType = 130
                )
            )
            AND NOT EXISTS (
                SELECT 1 
                FROM material AS a1 
                WHERE a1.partid = part.id 
                    AND a1.personid=@pid 
                    AND a1.modelid=model.id
                )';
        DECLARE @Params VARCHAR(200);
        SET @Params = '@pid INT';
        EXEC sys.sp_executesql @cmd
            , @Params
            , @pid = @pid;
    end
End

The above code will cause a new plan to be generated for each combination of @partlistid, and @masterid.

The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.

Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included @pid since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.

Best Answer

Specifics

Related Solutions

Sql-server – Join To a @Table Variable is running ineficiently

Sql-server – deteriorating stored procedure running times

Related Question