Sql-server – What events cause each run_status code to appear in sysjobhistory

jobssql server

We have been asked to report on the health of our SQL jobs. I am having trouble understanding the run_status column in dbo.sysjobhistory. The MSDB page for sysjobhistory gives this description:

Status of the job execution:

  • 0 = Failed
  • 1 = Succeeded
  • 2 = Retry
  • 3 = Canceled

What events cause each status code?

Here is what I have done to attempt to produce each status code. I was able to reproduce…

  • 0 (Failed) by writing SQL that throws an error.
    • For example, step 1 is SELECT SELECT
  • 1 (Succeeded) by writing SQL that succeeds.
    • For example, step 1 is SELECT 1
  • 3 (Cancelled) by stopping a job while it was running
    enter image description here
  • 4 (In progress) by querying sysjobshistory while a job was running

I know a way produce 0's, 1's, 3's, and 4's, but I don't know all of the ways to produce them, and I have no idea how to cause a 2.

The health of our jobs is binary. We are either healthy or not healthy, because the end user does not want to see anything more granular than that on their dashboard. That is why I am asking this question – I have to interpret the status codes and run durations and draw a single, binary conclusion about the health of the SQL jobs. The problem is that MSDN doesn't explain what the status codes mean, or more importantly, what events might cause them.

For example, if a user cancels a job (resulting in a run_status of 3), we might decide that we don't care and just report that we are healthy. However, if a job gets cancelled because of some other event, we might care about that.

Best Answer

Create a job with a single step. On the advanced tab for that step, change retry attempts from 0 to something greater than 0. In the job step, have it be your SELECT SELECT. Run the job. You will see an entry .

I don't think anyone could provide a full list of all scenarios that could lead to the various possible status values. Did you try anything like this while a job was running (and it's easy to test this with a simple job that has a loop with SELECT TOP (1) name FROM somedb.sys.all_objects;WAITFOR DELAY '00:00:01';); things like:

  • unplugging the computer
  • failing over (cluster or AG)
  • setting somedb to single_user or offline
  • killing sqlservr.exe process or shutting down the agent service
  • having the step raising batch- and connection-aborting errors
  • having the step raise an exception that causes "severe error" or assertion check failures (assuming there are some that exist and you can repro on your version)

run_status is useful, but you can get a better picture from the combination of run_status and message (and, in severe cases, the SQL Server error log). The trouble with message is that the contents can get truncated, so if you have a step that performs 400 successful backups and then one fails, the content will be full of successful messages, and the failure will be missing. For important jobs you may want to log to a table or to a file; you can see a variety of options on the Advanced tab of any job step.