Sql-server – deteriorating stored procedure running times

execution-plansql serverstored-procedures

I have a stored procedure that is executed by my web application. Today, when an end user started an operation that affected more rows than usual, I started getting calls that the wait times for the stored procedure to do its job suddenly jumped.

Symptoms:

  • The stored procedure below takes like a minute to run on production server
  • Even if no rows are affected, the sp still takes 1 minute
  • Edit: at the time I started looking into this:
    • I noticed the count (*) from materials increasing slowly like 5 rows a second.
    • At that time, interpolating from the increasing record count, the stored procedure took like 30 minutes to do its job.
    • I also saw that about 10 people (pids) had started running the stored procedure with half of them successfully. Others were repeatedly trying to launch it unsuccessfully. See log snippet at the end
    • As the stored procedure usually runs in one second, this doesn’t normally happen.
    • After the materials count stopped increasing (I suspect all the queued up stored procedures had stopped running), the running time for the sp was 1 minute
  • When running the lines between –A and –B in the SMSS query window (on production server), the operation done almost instantly
  • When running the same stored procedure on the staging server (with
    less memory and less processing capacity), the operation is done
    almost instantly. The data is the same.

Temporary fix:

  • I rename the procedure on the production server. I guess this
    recompiles the execution plan.
  • The stored procedure is now run almost instantly on the production server
  • But in like one hour or so, the situation is again just as bad and I am back to 1 minute execution times
  • Note that the execution times are bad even if no rows are inserted

Background:

  • There are about 3 million materials -rows
  • the select statement returns about 9000 rows
  • There are about 20 parts and 450 models
  • the load on the server is very light

Here is the proc:

CREATE PROCEDURE [dbo].[create_grid_materials2] 
(
    @partlistid bigint
    , @pid bigint
    , @masterid bigint
)
AS
BEGIN
    begin
        --A

        insert into material (partid, personid, modelID )
        select 
            part.id as x,
            @pid as personid,
            model.id as modelid from part
        join model on 1=1
        where (model.masterid=@masterid and model.modelSetID is null
            and part.partlistid = @partlistid
            and (part.partType = 100 or part.partType=120 or part.partType = 130))
            and not exists (select 1 from material as a1 where a1.partid = part.id 
            and a1.personid=@pid and a1.modelid=model.id)
        --B
    end
End

Edit: The join is a cartesian join and will produce materials for each (model, part) combination. I could have written ..from part cross join model ..

Sometimes the (model, part) combination yields 10 results, sometimes 10000

The stored procedure is run for each pid (about 20 rows) of one master and partlist (about 20 rows of parts) which produces a materials row for each (model, part) combination. After that, it is never run again for that pid.

The essential parts of tables have been copied below.

I assume SQL Server updates its execution plans automatically. But even if the sp-parameters change, what could have caused the recompilation to change the plan to a one with so bad performance? If the problem is the execution plan, looks like it was doing full table scans.

CREATE TABLE [dbo].[Model](
    [ID] [bigint] IDENTITY(600000,1) NOT NULL,
    [MasterID] [bigint] NULL,
    [ModelName] [nvarchar](400) NULL,
    [ModelSetID] [bigint] NULL
CONSTRAINT [PK_RfxProduct] PRIMARY KEY CLUSTERED 
([ID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY]

GO


--ALTER TABLE [dbo].[Model]  WITH CHECK ADD  CONSTRAINT [FK__Model__Master] FOREIGN KEY([MasterID]) REFERENCES [dbo].[Master] ([ID])
--GO

--------------
CREATE TABLE [dbo].[Part](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [PartListID] [bigint] NOT NULL,
    [PartType] [int] NOT NULL,
    [PartDescription] [nvarchar](max) NULL,
    [PartName] [nvarchar](4000) NULL
CONSTRAINT [PK_Question] PRIMARY KEY CLUSTERED 
([ID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

--ALTER TABLE [dbo].[Part]  WITH CHECK ADD  CONSTRAINT [FK_Part_PartList] FOREIGN KEY([PartListID]) REFERENCES [dbo].[PartList] ([ID])
--GO
--------------


CREATE TABLE [dbo].[Person2](
    [ID] [bigint] IDENTITY(600000,1) NOT NULL,
    [MasterID] [bigint] NOT NULL,
    [StatusFlag] [int] NOT NULL DEFAULT ((0)),
    [FlagComment] [nvarchar](max) NULL,
CONSTRAINT [PK_Person2] PRIMARY KEY CLUSTERED 
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO

--------------
CREATE TABLE [dbo].[Material](
    [ID] [bigint] IDENTITY(1,1) NOT NULL,
    [PartID] [bigint] NOT NULL,
    [PersonID] [bigint] NOT NULL,
    [ModelID] [bigint] NULL,
    [TextValue] [nvarchar](max) NULL
CONSTRAINT [PK_Material] PRIMARY KEY CLUSTERED 
(
[ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]

GO
ALTER TABLE [dbo].[Material]  WITH CHECK ADD  CONSTRAINT [FK_Material_Model] FOREIGN KEY([ModelID]) REFERENCES [dbo].[Model] ([ID])
GO
ALTER TABLE [dbo].[Material]  WITH CHECK ADD  CONSTRAINT [FK_Material_Person] FOREIGN KEY([PersonID]) REFERENCES [dbo].Person2 ([ID])
GO

This is what the exception looked like:

System.Web.HttpException (0x80004005): Error executing child request for handler 'System.Web.Mvc.HttpHandlerUtil+ServerExecuteHttpHandlerAsyncWrapper'. ---> System.Data.SqlClient.SqlException (0x80131904): Timeout expired.  The timeout period elapsed prior to completion of the operation or the server is not responding.
The statement has been terminated. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out
   at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
   at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString)
   at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds)
   at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite)
   at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite)
   at System.Data.SqlClient.SqlCommand.ExecuteNonQuery()
System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1.Dispatch[TTarget,TInterceptionContext,TResult](TTarget target, Func`3 operation, TInterceptionContext interceptionContext, Action`3 executing, Action`3 executed)
   at System.Data.Entity.Infrastructure.Interception.DbCommandDispatcher.NonQuery(DbCommand command, DbCommandInterceptionContext interceptionContext)
...

Edit: 4 days later the sp is now again running smoothly without any changes to execution plan settings. I am now trying to reproduce the problem by simulating the load. If that fails, I must wait until the problem happens again and then check the execution plan. I will update this when this happens.

Best Answer

What is up with FROM part JOIN model ON 1=1? This the same as FROM part, model, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?

You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.

This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.

You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using sp_executesql, as in the following example:

CREATE PROCEDURE [dbo].[create_grid_materials2] 
(
    @partlistid bigint
    , @pid bigint
    , @masterid bigint
)
AS
BEGIN
    begin
        DECLARE @cmd NVARCHAR(MAX);

        SET @cmd = '   
        INSERT INTO material (partid, personid, modelID)
        SELECT 
            partid = part.id
            , personid = @pid
            , modelid = model.id  
        FROM part
            INNER JOIN model ON 1=1
        WHERE (
            model.masterid = ' + CONVERT(NVARCHAR(50), @masterid) + ' 
                AND model.modelSetID IS NULL
                AND part.partlistid = ' + CONVERT(NVARCHAR(50), @partlistid) + '
                AND (
                    part.partType = 100 
                    or part.partType=120 
                    or part.partType = 130
                )
            )
            AND NOT EXISTS (
                SELECT 1 
                FROM material AS a1 
                WHERE a1.partid = part.id 
                    AND a1.personid=@pid 
                    AND a1.modelid=model.id
                )';
        DECLARE @Params VARCHAR(200);
        SET @Params = '@pid INT';
        EXEC sys.sp_executesql @cmd
            , @Params
            , @pid = @pid;
    end
End

The above code will cause a new plan to be generated for each combination of @partlistid, and @masterid.

The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.

Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included @pid since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.