I know raising the transaction isolation level and adding a transaction can seem counter-intuitive, but sometimes, you have to make things queue up and wait their turn. Doing those two things worked for me in my simple test rig. Try it and let us know how you get on.
EDIT: Test rig
OK, nice simple test rig using tempdb. It's a bit rough and ready but provided an easy repro and provable fix for me. You should enable SQLCMD mode in SSMS before you run. I basically pit the proc against itself in multiple sessions using either ostress or GO 100. Run the initial version of the proc, monitor for deadlocks; I get many. Then, reset the rig, run in the revised version of the proc and try again.
USE tempdb
GO
------------------------------------------------------------------------------------------------
-- Setup START
------------------------------------------------------------------------------------------------
SET NOCOUNT ON
GO
IF OBJECT_ID('[dbo].[RawData_d_DeleteObsoleteRawData]') IS NOT NULL
DROP PROC [dbo].[RawData_d_DeleteObsoleteRawData]
GO
-- Example table type
IF EXISTS ( SELECT * FROM sys.types st JOIN sys.schemas ss ON st.schema_id = ss.schema_id WHERE st.name = N'IntListType' )
DROP TYPE IntListType
GO
CREATE TYPE IntListType AS TABLE ( Item INT PRIMARY KEY )
GO
-- Create a large table
-- DROP TABLE dbo.rawData
IF OBJECT_ID('dbo.rawData') IS NULL
CREATE TABLE dbo.rawData (
rowId INT IDENTITY PRIMARY KEY,
AreaId INT,
MeasureId INT,
someData UNIQUEIDENTIFIER DEFAULT NEWID(),
DateEnergy DATETIME DEFAULT GETDATE(),
addedBy VARCHAR(30) DEFAULT SUSER_NAME(),
ts ROWVERSION
)
GO
TRUNCATE TABLE dbo.rawData
-- Populate table
;WITH cte AS (
SELECT TOP 1000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
CROSS JOIN master.sys.columns c2
CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.rawData ( AreaId, MeasureId, DateEnergy )
SELECT areaId.rn, MeasureId.rn, DATEADD( day, DateEnergy.rn % 300, '1 Jan 2014' )
FROM
( SELECT TOP 10 rn FROM cte ) areaId,
( SELECT TOP 100 rn FROM cte ) MeasureId,
( SELECT TOP 1000 rn FROM cte ) DateEnergy,
( SELECT TOP 2 rn FROM cte ) doubleUp
GO
-- Original proc
CREATE PROCEDURE [dbo].[RawData_d_DeleteObsoleteRawData]
@RemoteReadingMeasureIdList [IntListType] READONLY,
@MaxSavingDurationInDays INT
AS
BEGIN
DECLARE @CurrentDate SMALLDATETIME = CAST(GETUTCDATE() as SMALLDATETIME)
DECLARE @MaxTime SMALLDATETIME
DECLARE @TotalDeleted INT = 0
-- Max time for raw data
SET @MaxTime = DATEADD(DAY, -@MaxSavingDurationInDays, @CurrentDate)
-- Delete non remote reading raw data
DELETE TOP (10000) FROM [dbo].[RawData]
WHERE MeasureId NOT IN (SELECT Item FROM @RemoteReadingMeasureIdList)
AND DateEnergy < @MaxTime
SET @TotalDeleted = @TotalDeleted + @@ROWCOUNT
-- Delete non remote reading raw data (keep 1 raw data per day for remote reading)
;WITH cte AS
(
SELECT AreaId, MeasureId, DateEnergy,
ROW_NUMBER() OVER (PARTITION BY MeasureId, YEAR(DateEnergy), DAY(DateEnergy) ORDER BY DateEnergy DESC) AS rn
FROM [RawData] rd WITH (NOLOCK)
INNER JOIN @RemoteReadingMeasureIdList m ON rd.MeasureId = m.Item
WHERE DateEnergy < @MaxTime
)
DELETE TOP (10000) rd
FROM [RawData] rd
INNER JOIN cte ON rd.AreaId = cte.AreaId AND rd.DateEnergy = cte.DateEnergy AND rd.MeasureId = cte.MeasureId
WHERE rn <> 1
SET @TotalDeleted = @TotalDeleted + @@ROWCOUNT
SELECT @TotalDeleted as DeletedRow
END
GO
:exit
-- Setup END
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-- Main proc call START
-- Run this section in two or more connections
-- eg using ostress
-- !!ostress.exe -E -S.\sql2008r2 -dtempdb -i"C:\temp\temp.sql" -n10 -r5 -mstress
------------------------------------------------------------------------------------------------
-- Proc call
SET NOCOUNT ON
DECLARE @tvp AS [IntListType]
INSERT INTO @tvp
VALUES ( 1 ), ( 2 ), ( 3 ), ( 5 ), ( 8 ), ( 13 )
EXEC [dbo].[RawData_d_DeleteObsoleteRawData] @tvp, 30
GO 100
-- Main proc call END
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
-- Revised proc START
------------------------------------------------------------------------------------------------
/* Reset the rig and try with the revised proc
ALTER PROCEDURE [dbo].[RawData_d_DeleteObsoleteRawData]
@RemoteReadingMeasureIdList [IntListType] READONLY,
@MaxSavingDurationInDays INT
AS
BEGIN
DECLARE @CurrentDate SMALLDATETIME = CAST(GETUTCDATE() as SMALLDATETIME)
DECLARE @MaxTime SMALLDATETIME
DECLARE @TotalDeleted INT = 0
-- Max time for raw data
SET @MaxTime = DATEADD(DAY, -@MaxSavingDurationInDays, @CurrentDate)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
BEGIN TRY
BEGIN TRAN
-- Delete non remote reading raw data
DELETE TOP (10000) FROM [dbo].[RawData]
WHERE MeasureId NOT IN ( SELECT Item FROM @RemoteReadingMeasureIdList )
AND DateEnergy < @MaxTime
SET @TotalDeleted = @TotalDeleted + @@ROWCOUNT
-- Delete non remote reading raw data (keep 1 raw data per day for remote reading)
;WITH cte AS
(
SELECT AreaId, MeasureId, DateEnergy,
ROW_NUMBER() OVER (PARTITION BY MeasureId, YEAR(DateEnergy), DAY(DateEnergy) ORDER BY DateEnergy DESC) AS rn
FROM [RawData] rd
INNER JOIN @RemoteReadingMeasureIdList m ON rd.MeasureId = m.Item
WHERE DateEnergy < @MaxTime
)
DELETE TOP (10000) rd
FROM [RawData] rd
INNER JOIN cte ON rd.AreaId = cte.AreaId AND rd.DateEnergy = cte.DateEnergy AND rd.MeasureId = cte.MeasureId
WHERE rn <> 1
SET @TotalDeleted = @TotalDeleted + @@ROWCOUNT
SELECT @TotalDeleted as DeletedRow
COMMIT TRAN
END TRY
BEGIN CATCH
DECLARE @errorMessage NVARCHAR(2048) = ERROR_MESSAGE()
IF @@TRANCOUNT > 0 BEGIN PRINT 'Error handler rolling back...' ROLLBACK TRAN END
RAISERROR( @errorMessage, 16, 1 )
END CATCH
END
GO
*/
-- Revised proc END
------------------------------------------------------------------------------------------------
What you're describing sounds like parameter sniffing to me. Parameter sniffing occurs because SQL Server caches the execution plan of the procedure if the plan is not found in the cache. Subsequent executions will use that version of the plan. In most cases, that's desirable since it eliminates compilation time. However, if the characteristics of the data accessed by the procedure's queries varies significantly when different parameters are passed, the result can be very poor performance.
Your description of the runtimes of short date ranges versus long ranges fits that pattern perfectly. I would guess that the query optimizer is choosing different indexes in the two scenarios.
You're on the right track for your solutions. WITH RECOMPILE
can resolve the parameter sniffing in some cases, but recompilation is a high cost to pay on every execution. OPTION(RECOMPILE)
on individual queries can definitely help and is sometimes the best answer, but again, it comes at a high cost. OPTION(OPTIMIZE FOR UNKNOWN)
is another possibility, but it resolves the issue by providing a plan that's consistent (but may still be horribly inefficient).
Start by reviewing these articles, which explain parameter sniffing far better than I can in a short answer:
Then, check the execution plans for your procedure, looking for the specific queries that vary the most between executions. Try OPTION(RECOMPILE)
, OPTION(OPTIMIZE FOR UNKNOWN)
, or Kimberly's sp_executesql
method on those, rather than the entire procedure.
Best Answer
At one point I answered this question over at StackOverflow, but it seems like it would be useful to have that information at DBA.SE as well, revised and updated.
Just to be totally explicit: TSQL does not (by itself) have the ability to launch other TSQL operations asynchronously.
That doesn't mean you don't still have a lot of options (some of them mentioned in other answers):
sp_start_job
. If you need to monitor their progress programatically, just make sure the jobs each update a custom JOB_PROGRESS table (or you can check to see if they have finished yet using the undocumented functionxp_sqlagent_enum_jobs
as described in this excellent article by Gregory A. Larsen). You have to create as many separate jobs as you want parallel processes running, even if they are running the same stored proc with different parameters.sp_oacreate
andsp_oamethod
to launch a new process calling each other stored proc as described in this article, also by Gregory A. Larsen.Parallel_AddSql
andParallel_Execute
as described in this article by Alan Kaplan (SQL2005+ only).If it were me, I'd probably use multiple SQL Agent Jobs in simpler scenarios, and an SSIS package in more complex scenarios.
In your case, unless you're trying to launch 200 separate threads, multiple scheduled Agent jobs sounds like a simple and manageable choice.
One final comment: SQL already attempts to parallelize individual operations whenever it can*. This means that running 2 tasks at the same time instead of after each other is no guarantee that it will finish sooner. Test carefully to see whether it actually improves anything or not.
We had a developer that created a DTS package to run 8 tasks at the same time. Unfortunately, it was only a 4-CPU server :)
*Assuming default settings. This can be modified by altering the server's Maximum Degree of Parallelism or Affinity Mask, or by using the MAXDOP query hint.