I have a stored procedure that is executed by my web application. Today, when an end user started an operation that affected more rows than usual, I started getting calls that the wait times for the stored procedure to do its job suddenly jumped.
Symptoms:
- The stored procedure below takes like a minute to run on production server
- Even if no rows are affected, the sp still takes 1 minute
- Edit: at the time I started looking into this:
- I noticed the count (*) from materials increasing slowly like 5 rows a second.
- At that time, interpolating from the increasing record count, the stored procedure took like 30 minutes to do its job.
- I also saw that about 10 people (pids) had started running the stored procedure with half of them successfully. Others were repeatedly trying to launch it unsuccessfully. See log snippet at the end
- As the stored procedure usually runs in one second, this doesn’t normally happen.
- After the materials count stopped increasing (I suspect all the queued up stored procedures had stopped running), the running time for the sp was 1 minute
- When running the lines between –A and –B in the SMSS query window (on production server), the operation done almost instantly
- When running the same stored procedure on the staging server (with
less memory and less processing capacity), the operation is done
almost instantly. The data is the same.
Temporary fix:
- I rename the procedure on the production server. I guess this
recompiles the execution plan. - The stored procedure is now run almost instantly on the production server
- But in like one hour or so, the situation is again just as bad and I am back to 1 minute execution times
- Note that the execution times are bad even if no rows are inserted
Background:
- There are about 3 million materials -rows
- the select statement returns about 9000 rows
- There are about 20 parts and 450 models
- the load on the server is very light
Here is the proc:
CREATE PROCEDURE [dbo].[create_grid_materials2]
(
@partlistid bigint
, @pid bigint
, @masterid bigint
)
AS
BEGIN
begin
--A
insert into material (partid, personid, modelID )
select
part.id as x,
@pid as personid,
model.id as modelid from part
join model on 1=1
where (model.masterid=@masterid and model.modelSetID is null
and part.partlistid = @partlistid
and (part.partType = 100 or part.partType=120 or part.partType = 130))
and not exists (select 1 from material as a1 where a1.partid = part.id
and a1.personid=@pid and a1.modelid=model.id)
--B
end
End
Edit: The join is a cartesian join and will produce materials for each (model, part) combination. I could have written ..from part cross join model ..
Sometimes the (model, part) combination yields 10 results, sometimes 10000
The stored procedure is run for each pid (about 20 rows) of one master and partlist (about 20 rows of parts) which produces a materials row for each (model, part) combination. After that, it is never run again for that pid.
The essential parts of tables have been copied below.
I assume SQL Server updates its execution plans automatically. But even if the sp-parameters change, what could have caused the recompilation to change the plan to a one with so bad performance? If the problem is the execution plan, looks like it was doing full table scans.
CREATE TABLE [dbo].[Model]( [ID] [bigint] IDENTITY(600000,1) NOT NULL, [MasterID] [bigint] NULL, [ModelName] [nvarchar](400) NULL, [ModelSetID] [bigint] NULL CONSTRAINT [PK_RfxProduct] PRIMARY KEY CLUSTERED ([ID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] GO --ALTER TABLE [dbo].[Model] WITH CHECK ADD CONSTRAINT [FK__Model__Master] FOREIGN KEY([MasterID]) REFERENCES [dbo].[Master] ([ID]) --GO -------------- CREATE TABLE [dbo].[Part]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [PartListID] [bigint] NOT NULL, [PartType] [int] NOT NULL, [PartDescription] [nvarchar](max) NULL, [PartName] [nvarchar](4000) NULL CONSTRAINT [PK_Question] PRIMARY KEY CLUSTERED ([ID] ASC) WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO --ALTER TABLE [dbo].[Part] WITH CHECK ADD CONSTRAINT [FK_Part_PartList] FOREIGN KEY([PartListID]) REFERENCES [dbo].[PartList] ([ID]) --GO -------------- CREATE TABLE [dbo].[Person2]( [ID] [bigint] IDENTITY(600000,1) NOT NULL, [MasterID] [bigint] NOT NULL, [StatusFlag] [int] NOT NULL DEFAULT ((0)), [FlagComment] [nvarchar](max) NULL, CONSTRAINT [PK_Person2] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO -------------- CREATE TABLE [dbo].[Material]( [ID] [bigint] IDENTITY(1,1) NOT NULL, [PartID] [bigint] NOT NULL, [PersonID] [bigint] NOT NULL, [ModelID] [bigint] NULL, [TextValue] [nvarchar](max) NULL CONSTRAINT [PK_Material] PRIMARY KEY CLUSTERED ( [ID] ASC )WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON) ON [PRIMARY] ) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY] GO ALTER TABLE [dbo].[Material] WITH CHECK ADD CONSTRAINT [FK_Material_Model] FOREIGN KEY([ModelID]) REFERENCES [dbo].[Model] ([ID]) GO ALTER TABLE [dbo].[Material] WITH CHECK ADD CONSTRAINT [FK_Material_Person] FOREIGN KEY([PersonID]) REFERENCES [dbo].Person2 ([ID]) GO
This is what the exception looked like:
System.Web.HttpException (0x80004005): Error executing child request for handler 'System.Web.Mvc.HttpHandlerUtil+ServerExecuteHttpHandlerAsyncWrapper'. ---> System.Data.SqlClient.SqlException (0x80131904): Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. The statement has been terminated. ---> System.ComponentModel.Win32Exception (0x80004005): The wait operation timed out at System.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction) at System.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose) at System.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady) at System.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString) at System.Data.SqlClient.SqlCommand.RunExecuteReaderTds(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, Boolean async, Int32 timeout, Task& task, Boolean asyncWrite, SqlDataReader ds) at System.Data.SqlClient.SqlCommand.RunExecuteReader(CommandBehavior cmdBehavior, RunBehavior runBehavior, Boolean returnStream, String method, TaskCompletionSource`1 completion, Int32 timeout, Task& task, Boolean asyncWrite) at System.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, String methodName, Boolean sendToPipe, Int32 timeout, Boolean asyncWrite) at System.Data.SqlClient.SqlCommand.ExecuteNonQuery() System.Data.Entity.Infrastructure.Interception.InternalDispatcher`1.Dispatch[TTarget,TInterceptionContext,TResult](TTarget target, Func`3 operation, TInterceptionContext interceptionContext, Action`3 executing, Action`3 executed) at System.Data.Entity.Infrastructure.Interception.DbCommandDispatcher.NonQuery(DbCommand command, DbCommandInterceptionContext interceptionContext) ...
Edit: 4 days later the sp is now again running smoothly without any changes to execution plan settings. I am now trying to reproduce the problem by simulating the load. If that fails, I must wait until the problem happens again and then check the execution plan. I will update this when this happens.
Best Answer
What is up with
FROM part JOIN model ON 1=1
? This the same asFROM part, model
, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.
This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.
You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using
sp_executesql
, as in the following example:The above code will cause a new plan to be generated for each combination of
@partlistid
, and@masterid
.The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.
Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included
@pid
since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.