SQL Server INSERT OUTPUT Clause – Is It Safe to Rely on Order?

bulk-insertidentityinsertsql serversql-server-2012

Given this table:

CREATE TABLE dbo.Target (
   TargetId int identity(1, 1) NOT NULL,
   Color varchar(20) NOT NULL,
   Action varchar(10) NOT NULL, -- of course this should be normalized
   Code int NOT NULL,
   CONSTRAINT PK_Target PRIMARY KEY CLUSTERED (TargetId)
);

In two slightly different scenarios I want to insert rows and return the values from the identity column.

Scenario 1

INSERT dbo.Target (Color, Action, Code)
OUTPUT inserted.TargetId
SELECT t.Color, t.Action, t.Code
FROM
   (VALUES
      ('Blue', 'New', 1234),
      ('Blue', 'Cancel', 4567),
      ('Red', 'New', 5678)
   ) t (Color, Action, Code)
;

Scenario 2

CREATE TABLE #Target (
   Color varchar(20) NOT NULL,
   Action varchar(10) NOT NULL,
   Code int NOT NULL,
   PRIMARY KEY CLUSTERED (Color, Action)
);

-- Bulk insert to the table the same three rows as above by any means

INSERT dbo.Target (Color, Action, Code)
OUTPUT inserted.TargetId
SELECT t.Color, t.Action, t.Code
FROM #Target
;

Question

Can I rely on the returned identity values from the dbo.Target table insert to be returned in the order they existed in the 1) VALUES clause and 2) #Target table, so that I can correlate them by their position in the output rowset back to the original input?

For Reference

Here is some trimmed-down C# code that demonstrates what's happening in the application (scenario 1, soon to be converted to use SqlBulkCopy):

public IReadOnlyCollection<Target> InsertTargets(IEnumerable<Target> targets) {
   var targetList = targets.ToList();
   const string insertSql = @"
      INSERT dbo.Target (
         CoreItemId,
         TargetDateTimeUtc,
         TargetTypeId,
      )
      OUTPUT
         Inserted.TargetId
      SELECT
         input.CoreItemId,
         input.TargetDateTimeUtc,
         input.TargetTypeId,
      FROM
         (VALUES
            {0}
         ) input (
            CoreItemId,
            TargetDateTimeUtc,
            TargetTypeId
         );";
   var results = Connection.Query<DbTargetInsertResult>(
      string.Format(
         insertSql,
         string.Join(
            ", ",
            targetList
               .Select(target => $@"({target.CoreItemId
                  }, '{target.TargetDateTimeUtc:yyyy-MM-ddTHH:mm:ss.fff
                  }', {(byte) target.TargetType
                  })";
               )
         )
      )
      .ToList();
   return targetList
      .Zip( // The correlation that relies on the order of the two inputs being the same
         results,
         (inputTarget, insertResult) => new Target(
            insertResult.TargetId, // with the new TargetId to replace null.
            inputTarget.TargetDateTimeUtc,
            inputTarget.CoreItemId,
            inputTarget.TargetType
         )
      )
      .ToList()
      .AsReadOnly();
}

Best Answer

Can I rely on the returned identity values from the dbo.Target table insert to be returned in the order they existed in the 1) VALUES clause and 2) #Target table, so that I can correlate them by their position in the output rowset back to the original input?

No, you can't rely on anything to be guaranteed without an actual documented guarantee. The documentation explicitly states there is no such guarantee.

SQL Server does not guarantee the order in which rows are processed and returned by DML statements using the OUTPUT clause. It is up to the application to include an appropriate WHERE clause that can guarantee the desired semantics, or understand that when multiple rows may qualify for the DML operation, there is no guaranteed order.

This would rely on a lot of undocumented assumptions

  1. The order the rows are output from the constant scan is in the same order as the values clause (I've never seen them differ but AFAIK this is not guaranteed).
  2. The order the rows are inserted will be the same as the order they are output from the constant scan (definitely not always the case).
  3. If using a "wide" (per index) execution plan the values from the output clause will be pulled from the clustered index update operator and not that of any secondary indexes.
  4. That the order is guaranteed to be preserved thereafter - e.g. when packaging rows up for transmission over the network.
  5. That even if the order appears predictable now implementation changes to features such as parallel insert won't change the order in the future (currently if the OUTPUT clause is specified in the INSERT…SELECT statement to return results to the client, then parallel plans are disabled in general, including INSERTs)

An example of point two failing (assuming clustered PK of (Color, Action)) can be seen if you add 600 rows to the VALUES clause. Then the plan has a sort operator before the insert so losing your original order in the VALUES clause.

There is a documented way of achieving your goal though and this is to add a numbering to the source and use MERGE instead of INSERT

MERGE dbo.Target
USING (VALUES (1, 'Blue', 'New', 1234),
              (2, 'Blue', 'Cancel', 4567),
              (3, 'Red', 'New', 5678) ) t (SourceId, Color, Action, Code)
ON 1 = 0
WHEN NOT MATCHED THEN
  INSERT (Color,
          Action,
          Code)
  VALUES (Color,
          Action,
          Code)
OUTPUT t.SourceId,
       inserted.TargetId; 

enter image description here

@a_horse_with_no_name

Is the merge really necessary? Couldn't you just do an insert into ... select ... from (values (..)) t (...) order by sourceid?

Yes you could. Ordering guarantees in SQL Server… states that

INSERT queries that use SELECT with ORDER BY to populate rows guarantees how identity values are computed but not the order in which the rows are inserted

So you could use

INSERT dbo.Target (Color, Action, Code)
OUTPUT inserted.TargetId
SELECT t.Color, t.Action, t.Code
FROM
(VALUES (1, 'Blue', 'New', 1234),
        (2, 'Blue', 'Cancel', 4567),
        (3, 'Red', 'New', 5678) ) t (SourceId, Color, Action, Code)
ORDER BY t.SourceId

enter image description here

This would guarantee that the identity values are assigned in order of t.SourceId but not that they are output in any particular order or that the identity column values assigned have no gaps (e.g. if a concurrent insert is attempted).