Sql-server – SQL Server – MERGE duplicate SOURCE rows validation is only evaluated on UPDATE and DELETE clauses

data validationmergesql serversql server 2014unique-constraint

I want SQL Server to throw an error at me if my SOURCE table contains duplicates as per the ON clause in every case, including INSERT, but it only does for the UPDATE and DELETE cases. Why doesn't it?

For example, it takes two executions of this statement to get any sort of validation error. First, let's define the table:

--assume this is a permanent table that starts empty:
CREATE TABLE #MyTable(KeyID1 INT, KeyID2 INT, SomeValue REAL)--no primary key on purpose for example

And the MERGE statement I wish would throw the error at me on the first execution:

MERGE #MyTable AS TARGET
USING
(
    SELECT
         KeyID1 = 1
        ,KeyID2 = 1
        ,SomeValue = 1
    UNION ALL SELECT
         KeyID1 = 1
        ,KeyID2 = 1
        ,SomeValue = 2
) AS SOURCE ON
    TARGET.KeyID1 = SOURCE.KeyID1
    AND TARGET.KeyID2 = SOURCE.KeyID2
WHEN MATCHED THEN UPDATE SET
    TARGET.SomeValue = SOURCE.SomeValue
WHEN NOT MATCHED BY TARGET THEN INSERT
    (
         KeyID1
        ,KeyID2
        ,SomeValue
    )
    VALUES
    (
         SOURCE.KeyID1
        ,SOURCE.KeyID2
        ,SOURCE.SomeValue
    );

After the first execution, the contents of #MyTable are

sampe results

The error message that only comes on the second and subsequent executions:

Msg 8672, Level 16, State 1, Line 4
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

The wording of this error message seems to imply my desired validation should be occurring already.

Of course, one obvious thing to do is make the primary key on the TARGET table reject the INSERT, but that is unrelated to my question.

P.S. I came across this list of MERGE bugs that doesn't seem to apply here.

Best Answer

On the first run, there are no matches in the conditional check. That is, there are no rows in #MyTable that match the input rows.

Note that the matching in a MERGE does not consider rows inserted during the MERGE itself. If those rows could cause duplication, it is the responsibility of the developer to ensure that there are no potential conflicts in the source data. Since this is the case, and there are no constraints such as a Primary Key to conflict with the data being added, this insert succeeds for both rows.

Of course, on the second run, there are two matching rows, and the MERGE fails with the error you showed.

This is by design, and does speed the processing, as if the MERGE had to validate each row individually as they were considered (as opposed to set based validation performed by the MERGE internally), then the process would be more like using a cursor to handle each row one at a time.

How To Handle Source Duplicates

Using the simple case you have above, I would modify it like this to handle source duplication:

CREATE TABLE #MySource (        -- This represents the actual source of the data in the database
    KeyID1 int,
    KeyId2 int,
    SomeValue bigint
    )
INSERT INTO #MySource

    SELECT
         KeyID1 = 1
        ,KeyID2 = 1
        ,SomeValue = 1
    UNION ALL 
    SELECT
         1
        , 1
        , 2

-- Find the duplicates, and set up an error condition / report for them.

    SELECT 
        KeyID1,
        KeyId2,
        Count(*) as ct
    INTO #Duplicates
    FROM #MySource
    GROUP BY KeyID1, KeyId2
    HAVING COUNT(*) > 1

    IF EXISTS (Select * FROM #Duplicates)
    BEGIN 
        -- handle error here. Perform a Raiserror or THROW, or SELECT 
        -- the items from the duplicates and stop processing, or...
        Print 'Error - Duplicates!'
        SELECT *
        FROM #Duplicates
    END

    -- No duplicates, so continue
MERGE #MyTable AS TARGET
USING (
    SELECT *
    FROM #MySource m
    )
) AS SOURCE ON
    TARGET.KeyID1 = SOURCE.KeyID1
    AND TARGET.KeyID2 = SOURCE.KeyID2
WHEN MATCHED THEN UPDATE SET
    TARGET.SomeValue = SOURCE.SomeValue
WHEN NOT MATCHED BY TARGET THEN INSERT
    (
         KeyID1
        ,KeyID2
        ,SomeValue
    )
    VALUES
    (
         SOURCE.KeyID1
        ,SOURCE.KeyID2
        ,SOURCE.SomeValue
    );