Sql-server – SQL Server Agent Jobs – Long running job delaying other jobs

sql serversql-server-agent

I have several agent jobs that run on different schedules.

Job 1 runs every 5 minutes, only executing a stored procedure that usually takes less than 30 seconds to run.

This job Selects from table "users" then uses a cursor to loop through the results and insert records into local tables. After the cursor finishes 4 batch calls are made to a remote linked server to insert the records from the local tables to remote tables. Then the stored procedure ends.

The stored procedure sql is wrapped in a Begin, Begin Try, End Try, End statement.

Job 2, runs every hour, it too only executes a stored procedure, however, this job can take 20 minutes to run.

Job 2 performs two different Selects both referencing the "users" table (which Job 1 also selects from) and inserts the results into local tables, but without using a cursor. Then 4 batch calls to a remote linked server to insert the records from the local tables to remote tables. This stored procedure is also wrapped in a Begin, Begin Try, End Try, End.

When I check the Job Activity Monitor log for Job 1, it shows the duration is consistently under 30 seconds, except for the time that it runs on the hour, with Job 2. At this time it shows the duration 20+ minutes – I guess as a result of Job 2, the long running job, running.

Does this mean that Job 1 begins and as it begins Job 2 starts causing Job 1 to not complete until after Job 2 finishes (the reason why SQL agent log shows the duration of Job 1 as 20 minutes, even though this job never takes 20 minutes to actually execute)?

Is there a way that I can keep this long running job isolated so that it doesn't cause other fast running jobs to be delayed?

Best Answer

An easy workaround solution would be to nudge either of the jobs to start earlier/later and to not run together with the other job.

E.g.

    Job 1 Schedule: 00:14, 00:29, 00:44, 00:59, 01:14, ....
    Job 2 Schedule:                        01:00, .......

It's not an issue with the job scheduler, but an issue with the underlying tables that are being accessed. When Job 2 starts, it manages to acquire a lock on the users table (possibly: IX, IU) before the Job 1 is able to acquire its required locks on the same users table. Then Job 1 has to wait.

You are encountering blocking as previously observed by @MaxVernon. You could run the following script at the full hour to see if you can catch the statements involved:

SELECT sp.spid,
       sp.kpid,
       sp.blocked,
       sp.hostprocess,
       sp.waittype,
       sp.lastwaittype,
       DB_NAME(sp.dbid),
       sp.cpu,
       sp.physical_io,
       sp.memusage,
       CASE 
            WHEN sp.blocked = 0 THEN sp.spid
            WHEN sp.blocked != 0 THEN blocked
       END               AS chain,
       dest.text
FROM   sys.sysprocesses  AS sp
       OUTER APPLY sys.dm_exec_sql_text(sp.sql_handle) AS dest
WHERE  spid > 50
       AND spid <> @@spid
       AND (
               spid IN (SELECT blocked
                        FROM   sys.sysprocesses
                        WHERE  blocked != 0
                               AND spid > 50
                               AND spid != @@spid)
               OR blocked != 0
           )

The output will show statements that are blocking or blocked. You will probably find that the statements are from the Job 1 and Job 2.

Possible solutions:

Adding a NOLOCK hint can help, but should be avoided if possible.
Then you could also consider optimising the code so that the steps in Job 2 run faster.
Changing the schedules of the Job 1 and/or Job 2.

Related Solutions

Sql-server – Getting a “Request to run job refused because the job is already running” When no job is running

Alright so the problem I was having was that the write time to the log file for job completion occasionally takes longer than it takes to run the stored procedure that restarts this loop.

The solution to this is my original workaround.

In case anyone wants to use it here is the code.

BEGIN
WAITFOR DELAY '00:01';
    EXECUTE dbo.LoopStageStore
END; GO

Just make it wait a little bit.

Sql-server – Run Multiple Remote Jobs

I'd say the easiest possible way to do this is through PowerShell and SMO. Take the following code for instance:

[System.Reflection.Assembly]::LoadWithPartialName("Microsoft.SqlServer.Smo") |
    Out-Null

$SqlServerNames = "Server1", "Server2"
$SqlJobName = "YourJob"

foreach ($SqlServerName in $SqlServerNames) {
    $SqlServer = New-Object Microsoft.SqlServer.Management.Smo.Server($SqlServerName)

    $SqlServer.JobServer.Jobs[$SqlJobName].Start()
}

All this does it loop through a list of servers (explicitly stated at as string array variable, but can easily be obtained also from a SQL Server database table, a text file, etc.), connects to the current server, and then starts the job with the job named whatever you set the $SqlJobName variable to.

It's also worth noting that the Start() function will not wait for the job to complete before continuing code execution.

As you can see, in very few lines of PowerShell code you are able to accomplish this task. But I'd take it a step further, such as error handling. You don't want this to bomb out and not know what failed, when it failed, and what servers it did/didn't run on.

foreach ($SqlServerName in $SqlServerNames) {
    try {
        $SqlServer = New-Object Microsoft.SqlServer.Management.Smo.Server($SqlServerName)

        $SqlServer.JobServer.Jobs[$SqlJobName].Start()
    }
    catch {
        # add your error handling code here
        ## for instance, write the error to a text file
    }
}

And, as always with any code, test this out in a non-production environment to ensure that it does what you think it will and should do. Re-work it for your needs/environment.

As per RoKa's comment:

Can this be edited to check for whether the job is currently running, error handling?

Great point, and definitely something to check for (as well as if the job actually exists by checking for a non-null value):

foreach ($SqlServerName in $SqlServerNames) {
    try {
        $SqlServer = New-Object Microsoft.SqlServer.Management.Smo.Server($SqlServerName)

        $SqlJob = $SqlServer.JobServer.Jobs[$SqlJobName]

        if (($SqlJob) -and ($SqlJob.CurrentRunStatus -eq [Microsoft.SqlServer.Management.Smo.Agent.JobExecutionStatus]::Idle)) {
            $SqlJob.Start()
        }

        else {
            # handle/log accordingly if the job doesn't exist or it's not idle
        }

    }
    catch {
        # add your error handling code here
        ## for instance, write the error to a text file
    }
}

EDIT: While re-reading your question, I see SQL Server 2000. Are those going to be the target instances? If that is the case, and you do decide to go with the PowerShell/SMO answer I provided, then definitely definitely definitely test and ensure it is going to do what you want. Test this out very far from production and be 100% sure it does what you think it will.

EDIT: It looks like SMO versions before SQL Server 2012 (i.e. 2005, 2008, and 2008 R2) support managing SQL Server 2000 instances. I'm finding this information through this BOL reference.

Best Answer

Possible solutions:

Related Solutions

Sql-server – Getting a “Request to run job refused because the job is already running” When no job is running

Sql-server – Run Multiple Remote Jobs

Related Question