I want SQL Server to throw an error at me if my SOURCE
table contains duplicates as per the ON
clause in every case, including INSERT
, but it only does for the UPDATE
and DELETE
cases. Why doesn't it?
For example, it takes two executions of this statement to get any sort of validation error. First, let's define the table:
--assume this is a permanent table that starts empty:
CREATE TABLE #MyTable(KeyID1 INT, KeyID2 INT, SomeValue REAL)--no primary key on purpose for example
And the MERGE
statement I wish would throw the error at me on the first execution:
MERGE #MyTable AS TARGET
USING
(
SELECT
KeyID1 = 1
,KeyID2 = 1
,SomeValue = 1
UNION ALL SELECT
KeyID1 = 1
,KeyID2 = 1
,SomeValue = 2
) AS SOURCE ON
TARGET.KeyID1 = SOURCE.KeyID1
AND TARGET.KeyID2 = SOURCE.KeyID2
WHEN MATCHED THEN UPDATE SET
TARGET.SomeValue = SOURCE.SomeValue
WHEN NOT MATCHED BY TARGET THEN INSERT
(
KeyID1
,KeyID2
,SomeValue
)
VALUES
(
SOURCE.KeyID1
,SOURCE.KeyID2
,SOURCE.SomeValue
);
After the first execution, the contents of #MyTable
are
The error message that only comes on the second and subsequent executions:
Msg 8672, Level 16, State 1, Line 4
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.
The wording of this error message seems to imply my desired validation should be occurring already.
Of course, one obvious thing to do is make the primary key on the TARGET
table reject the INSERT
, but that is unrelated to my question.
P.S. I came across this list of MERGE
bugs that doesn't seem to apply here.
Best Answer
On the first run, there are no matches in the conditional check. That is, there are no rows in
#MyTable
that match the input rows.Note that the matching in a
MERGE
does not consider rows inserted during theMERGE
itself. If those rows could cause duplication, it is the responsibility of the developer to ensure that there are no potential conflicts in the source data. Since this is the case, and there are no constraints such as aPrimary Key
to conflict with the data being added, this insert succeeds for both rows.Of course, on the second run, there are two matching rows, and the
MERGE
fails with the error you showed.This is by design, and does speed the processing, as if the
MERGE
had to validate each row individually as they were considered (as opposed to set based validation performed by theMERGE
internally), then the process would be more like using a cursor to handle each row one at a time.How To Handle Source Duplicates
Using the simple case you have above, I would modify it like this to handle source duplication: