Sql-server – Run stored procedures in parallel

parallelismsql serversql server 2014stored-procedures

I'm looking to try and run the same stored procedure multiple times with different parameters but at the same time.

I'm using SQL 2014

The reason for this is that the procedure takes around 7 hours to complete. It actually does the same process many times. So for instance it might build a new database and tables for each branch.

What I want to do is break down the stored procedure so I can run in per branch but then run each query in parallel. I've run tested this by running it in separate query windows and it runs almost 80% quicker.

Can anyone give me a dummies guide into running queries in parallel?

Best Answer

At one point I answered this question over at StackOverflow, but it seems like it would be useful to have that information at DBA.SE as well, revised and updated.

Just to be totally explicit: TSQL does not (by itself) have the ability to launch other TSQL operations asynchronously.

That doesn't mean you don't still have a lot of options (some of them mentioned in other answers):

SQL Agent jobs: Create multiple SQL jobs, and either schedule them to run at the time desired, or start them asynchronously from a "master control" stored proc using sp_start_job. If you need to monitor their progress programatically, just make sure the jobs each update a custom JOB_PROGRESS table (or you can check to see if they have finished yet using the undocumented function xp_sqlagent_enum_jobs as described in this excellent article by Gregory A. Larsen). You have to create as many separate jobs as you want parallel processes running, even if they are running the same stored proc with different parameters.
SSIS Package: Create an SSIS package with a simple branching task flow. SSIS will launch those tasks in individual spids, which SQL will execute in parallel.
Custom application: Write a simple custom app in the language of your choice (C#, Powershell, etc), using the asynchronous methods provided by that language. Call a SQL stored proc on each application thread.
OLE Automation: In SQL, use sp_oacreate and sp_oamethod to launch a new process calling each other stored proc as described in this article, also by Gregory A. Larsen.
Service Broker: Look into using Service Broker, a good example of asynchronous execution in this article.
CLR Parallel Execution: Use the CLR commands Parallel_AddSql and Parallel_Execute as described in this article by Alan Kaplan (SQL2005+ only).
Scheduled Windows Tasks: Listed for completeness, but I'm not a fan of this option.

If it were me, I'd probably use multiple SQL Agent Jobs in simpler scenarios, and an SSIS package in more complex scenarios.

In your case, unless you're trying to launch 200 separate threads, multiple scheduled Agent jobs sounds like a simple and manageable choice.

One final comment: SQL already attempts to parallelize individual operations whenever it can*. This means that running 2 tasks at the same time instead of after each other is no guarantee that it will finish sooner. Test carefully to see whether it actually improves anything or not.

We had a developer that created a DTS package to run 8 tasks at the same time. Unfortunately, it was only a 4-CPU server :)

*Assuming default settings. This can be modified by altering the server's Maximum Degree of Parallelism or Affinity Mask, or by using the MAXDOP query hint.

Related Solutions

SQL Server – Making a Stored Procedure Thread Safe

I know raising the transaction isolation level and adding a transaction can seem counter-intuitive, but sometimes, you have to make things queue up and wait their turn. Doing those two things worked for me in my simple test rig. Try it and let us know how you get on.

EDIT: Test rig OK, nice simple test rig using tempdb. It's a bit rough and ready but provided an easy repro and provable fix for me. You should enable SQLCMD mode in SSMS before you run. I basically pit the proc against itself in multiple sessions using either ostress or GO 100. Run the initial version of the proc, monitor for deadlocks; I get many. Then, reset the rig, run in the revised version of the proc and try again.

USE tempdb
GO

------------------------------------------------------------------------------------------------
-- Setup START
------------------------------------------------------------------------------------------------

SET NOCOUNT ON
GO

IF OBJECT_ID('[dbo].[RawData_d_DeleteObsoleteRawData]') IS NOT NULL
DROP PROC [dbo].[RawData_d_DeleteObsoleteRawData]
GO

-- Example table type
IF EXISTS ( SELECT * FROM sys.types st JOIN sys.schemas ss ON st.schema_id = ss.schema_id WHERE st.name = N'IntListType' )
    DROP TYPE IntListType
GO

CREATE TYPE IntListType AS TABLE ( Item INT PRIMARY KEY ) 
GO
-- Create a large table
-- DROP TABLE dbo.rawData
IF OBJECT_ID('dbo.rawData') IS NULL
CREATE TABLE dbo.rawData ( 
    rowId INT IDENTITY PRIMARY KEY,
    AreaId INT, 
    MeasureId INT, 
    someData UNIQUEIDENTIFIER DEFAULT NEWID(), 
    DateEnergy DATETIME DEFAULT GETDATE(), 
    addedBy VARCHAR(30) DEFAULT SUSER_NAME(), 
    ts ROWVERSION 
    )
GO


TRUNCATE TABLE dbo.rawData

-- Populate table
;WITH cte AS (
SELECT TOP 1000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
    CROSS JOIN master.sys.columns c2
    CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.rawData ( AreaId, MeasureId, DateEnergy )
SELECT areaId.rn, MeasureId.rn, DATEADD( day, DateEnergy.rn % 300, '1 Jan 2014' )
FROM 
    ( SELECT TOP 10 rn FROM cte  ) areaId,
    ( SELECT TOP 100 rn FROM cte  ) MeasureId,
    ( SELECT TOP 1000 rn FROM cte  ) DateEnergy,
    ( SELECT TOP 2 rn FROM cte  ) doubleUp
GO



-- Original proc
CREATE PROCEDURE [dbo].[RawData_d_DeleteObsoleteRawData]
         @RemoteReadingMeasureIdList [IntListType] READONLY,
         @MaxSavingDurationInDays INT
AS
BEGIN

         DECLARE @CurrentDate SMALLDATETIME = CAST(GETUTCDATE() as SMALLDATETIME)
         DECLARE @MaxTime SMALLDATETIME
         DECLARE @TotalDeleted INT = 0

         -- Max time for raw data
         SET @MaxTime = DATEADD(DAY, -@MaxSavingDurationInDays, @CurrentDate)

         -- Delete non remote reading raw data
         DELETE TOP (10000) FROM [dbo].[RawData]
         WHERE MeasureId NOT IN (SELECT Item FROM @RemoteReadingMeasureIdList)
         AND DateEnergy < @MaxTime

         SET @TotalDeleted = @TotalDeleted + @@ROWCOUNT

         -- Delete non remote reading raw data (keep 1 raw data per day for remote reading)
         ;WITH cte AS
         (
                 SELECT AreaId, MeasureId, DateEnergy,
                          ROW_NUMBER() OVER (PARTITION BY MeasureId, YEAR(DateEnergy), DAY(DateEnergy) ORDER BY DateEnergy DESC) AS rn
                 FROM [RawData] rd WITH (NOLOCK)
                 INNER JOIN @RemoteReadingMeasureIdList m ON rd.MeasureId = m.Item
                 WHERE DateEnergy < @MaxTime
         )
         DELETE TOP (10000) rd
         FROM [RawData] rd
         INNER JOIN cte ON rd.AreaId = cte.AreaId AND rd.DateEnergy = cte.DateEnergy AND rd.MeasureId = cte.MeasureId
         WHERE rn <> 1

         SET @TotalDeleted = @TotalDeleted + @@ROWCOUNT

         SELECT @TotalDeleted as DeletedRow

    END
GO
:exit

-- Setup END
------------------------------------------------------------------------------------------------


------------------------------------------------------------------------------------------------
-- Main proc call START
-- Run this section in two or more connections
-- eg using ostress
-- !!ostress.exe -E -S.\sql2008r2 -dtempdb -i"C:\temp\temp.sql" -n10 -r5 -mstress
------------------------------------------------------------------------------------------------

-- Proc call
SET NOCOUNT ON
DECLARE @tvp AS [IntListType]

INSERT INTO @tvp
VALUES ( 1 ), ( 2 ), ( 3 ), ( 5 ), ( 8 ), ( 13 )

EXEC [dbo].[RawData_d_DeleteObsoleteRawData] @tvp, 30
GO 100

-- Main proc call END
------------------------------------------------------------------------------------------------



------------------------------------------------------------------------------------------------
-- Revised proc START
------------------------------------------------------------------------------------------------

/*  Reset the rig and try with the revised proc 

ALTER PROCEDURE [dbo].[RawData_d_DeleteObsoleteRawData]
         @RemoteReadingMeasureIdList [IntListType] READONLY,
         @MaxSavingDurationInDays INT
AS
BEGIN

         DECLARE @CurrentDate SMALLDATETIME = CAST(GETUTCDATE() as SMALLDATETIME)
         DECLARE @MaxTime SMALLDATETIME
         DECLARE @TotalDeleted INT = 0

         -- Max time for raw data
         SET @MaxTime = DATEADD(DAY, -@MaxSavingDurationInDays, @CurrentDate)

    SET TRANSACTION ISOLATION LEVEL REPEATABLE READ

    BEGIN TRY

        BEGIN TRAN

            -- Delete non remote reading raw data
            DELETE TOP (10000) FROM [dbo].[RawData]
            WHERE MeasureId NOT IN ( SELECT Item FROM @RemoteReadingMeasureIdList )
            AND DateEnergy < @MaxTime

            SET @TotalDeleted = @TotalDeleted + @@ROWCOUNT

            -- Delete non remote reading raw data (keep 1 raw data per day for remote reading)
            ;WITH cte AS
            (
                    SELECT AreaId, MeasureId, DateEnergy,
                            ROW_NUMBER() OVER (PARTITION BY MeasureId, YEAR(DateEnergy), DAY(DateEnergy) ORDER BY DateEnergy DESC) AS rn
                    FROM [RawData] rd
                    INNER JOIN @RemoteReadingMeasureIdList m ON rd.MeasureId = m.Item
                    WHERE DateEnergy < @MaxTime
            )
            DELETE TOP (10000) rd
            FROM [RawData] rd
            INNER JOIN cte ON rd.AreaId = cte.AreaId AND rd.DateEnergy = cte.DateEnergy AND rd.MeasureId = cte.MeasureId
             WHERE rn <> 1

             SET @TotalDeleted = @TotalDeleted + @@ROWCOUNT

             SELECT @TotalDeleted as DeletedRow

             COMMIT TRAN

         END TRY

         BEGIN CATCH

            DECLARE @errorMessage NVARCHAR(2048) = ERROR_MESSAGE()

            IF @@TRANCOUNT > 0 BEGIN PRINT 'Error handler rolling back...' ROLLBACK TRAN END

            RAISERROR( @errorMessage, 16, 1 )

         END CATCH

    END
GO
*/


-- Revised proc END
------------------------------------------------------------------------------------------------

Sql-server – SQL Server Stored procedures run very slowly after a while

What you're describing sounds like parameter sniffing to me. Parameter sniffing occurs because SQL Server caches the execution plan of the procedure if the plan is not found in the cache. Subsequent executions will use that version of the plan. In most cases, that's desirable since it eliminates compilation time. However, if the characteristics of the data accessed by the procedure's queries varies significantly when different parameters are passed, the result can be very poor performance.

Your description of the runtimes of short date ranges versus long ranges fits that pattern perfectly. I would guess that the query optimizer is choosing different indexes in the two scenarios.

You're on the right track for your solutions. WITH RECOMPILE can resolve the parameter sniffing in some cases, but recompilation is a high cost to pay on every execution. OPTION(RECOMPILE) on individual queries can definitely help and is sometimes the best answer, but again, it comes at a high cost. OPTION(OPTIMIZE FOR UNKNOWN) is another possibility, but it resolves the issue by providing a plan that's consistent (but may still be horribly inefficient).

Start by reviewing these articles, which explain parameter sniffing far better than I can in a short answer:

Kimberly Tripp, Building High Performance Stored Procedures
Kendra Little, Optimize for… Mediocre?

Then, check the execution plans for your procedure, looking for the specific queries that vary the most between executions. Try OPTION(RECOMPILE), OPTION(OPTIMIZE FOR UNKNOWN), or Kimberly's sp_executesql method on those, rather than the entire procedure.

Best Answer

Related Solutions

SQL Server – Making a Stored Procedure Thread Safe

Sql-server – SQL Server Stored procedures run very slowly after a while

Related Question