Sql-server – SQL Server – MERGE duplicate SOURCE rows validation is only evaluated on UPDATE and DELETE clauses

data validationmergesql serversql server 2014unique-constraint

I want SQL Server to throw an error at me if my SOURCE table contains duplicates as per the ON clause in every case, including INSERT, but it only does for the UPDATE and DELETE cases. Why doesn't it?

For example, it takes two executions of this statement to get any sort of validation error. First, let's define the table:

--assume this is a permanent table that starts empty:
CREATE TABLE #MyTable(KeyID1 INT, KeyID2 INT, SomeValue REAL)--no primary key on purpose for example

And the MERGE statement I wish would throw the error at me on the first execution:

MERGE #MyTable AS TARGET
USING
(
    SELECT
         KeyID1 = 1
        ,KeyID2 = 1
        ,SomeValue = 1
    UNION ALL SELECT
         KeyID1 = 1
        ,KeyID2 = 1
        ,SomeValue = 2
) AS SOURCE ON
    TARGET.KeyID1 = SOURCE.KeyID1
    AND TARGET.KeyID2 = SOURCE.KeyID2
WHEN MATCHED THEN UPDATE SET
    TARGET.SomeValue = SOURCE.SomeValue
WHEN NOT MATCHED BY TARGET THEN INSERT
    (
         KeyID1
        ,KeyID2
        ,SomeValue
    )
    VALUES
    (
         SOURCE.KeyID1
        ,SOURCE.KeyID2
        ,SOURCE.SomeValue
    );

After the first execution, the contents of #MyTable are

The error message that only comes on the second and subsequent executions:

Msg 8672, Level 16, State 1, Line 4
The MERGE statement attempted to UPDATE or DELETE the same row more than once. This happens when a target row matches more than one source row. A MERGE statement cannot UPDATE/DELETE the same row of the target table multiple times. Refine the ON clause to ensure a target row matches at most one source row, or use the GROUP BY clause to group the source rows.

The wording of this error message seems to imply my desired validation should be occurring already.

Of course, one obvious thing to do is make the primary key on the TARGET table reject the INSERT, but that is unrelated to my question.

_{P.S. I came across this list of MERGE bugs that doesn't seem to apply here.}

Best Answer

On the first run, there are no matches in the conditional check. That is, there are no rows in #MyTable that match the input rows.

Note that the matching in a MERGE does not consider rows inserted during the MERGE itself. If those rows could cause duplication, it is the responsibility of the developer to ensure that there are no potential conflicts in the source data. Since this is the case, and there are no constraints such as a Primary Key to conflict with the data being added, this insert succeeds for both rows.

Of course, on the second run, there are two matching rows, and the MERGE fails with the error you showed.

This is by design, and does speed the processing, as if the MERGE had to validate each row individually as they were considered (as opposed to set based validation performed by the MERGE internally), then the process would be more like using a cursor to handle each row one at a time.

How To Handle Source Duplicates

Using the simple case you have above, I would modify it like this to handle source duplication:

CREATE TABLE #MySource (        -- This represents the actual source of the data in the database
    KeyID1 int,
    KeyId2 int,
    SomeValue bigint
    )
INSERT INTO #MySource

    SELECT
         KeyID1 = 1
        ,KeyID2 = 1
        ,SomeValue = 1
    UNION ALL 
    SELECT
         1
        , 1
        , 2

-- Find the duplicates, and set up an error condition / report for them.

    SELECT 
        KeyID1,
        KeyId2,
        Count(*) as ct
    INTO #Duplicates
    FROM #MySource
    GROUP BY KeyID1, KeyId2
    HAVING COUNT(*) > 1

    IF EXISTS (Select * FROM #Duplicates)
    BEGIN 
        -- handle error here. Perform a Raiserror or THROW, or SELECT 
        -- the items from the duplicates and stop processing, or...
        Print 'Error - Duplicates!'
        SELECT *
        FROM #Duplicates
    END

    -- No duplicates, so continue
MERGE #MyTable AS TARGET
USING (
    SELECT *
    FROM #MySource m
    )
) AS SOURCE ON
    TARGET.KeyID1 = SOURCE.KeyID1
    AND TARGET.KeyID2 = SOURCE.KeyID2
WHEN MATCHED THEN UPDATE SET
    TARGET.SomeValue = SOURCE.SomeValue
WHEN NOT MATCHED BY TARGET THEN INSERT
    (
         KeyID1
        ,KeyID2
        ,SomeValue
    )
    VALUES
    (
         SOURCE.KeyID1
        ,SOURCE.KeyID2
        ,SOURCE.SomeValue
    );

Related Solutions

Sql-server – How to delete only related records in a multi-key MERGE in SQL Server

This is the separate DELETE operation I had in mind:

DELETE m
FROM dbo.Mapping AS m
WHERE EXISTS 
  (SELECT 1 FROM @Values WHERE LeftID = m.LeftID)
AND NOT EXISTS 
  (SELECT 1 FROM @Values WHERE LeftID = m.LeftID AND RightID = m.RightID);

As I outline here, for a left anti-semi join, the NOT EXISTS pattern will often outperform the LEFT JOIN / NULL pattern (but you should always test).

Not sure if your overall goal is clarity or performance, so only you can judge if this will work out better for your requirements than the NOT MATCHED BY source option. You'll have to look at the plans qualitatively, and the plans and/or runtime metrics quantitatively, to know for sure.

If you expect your MERGE command to protect you from race conditions that would happen with multiple independent statements, you better make sure that is true by changing it to:

MERGE dbo.Mapping WITH (HOLDLOCK) AS target

(From Dan Guzman's blog post.)

Personally, I would do all of this without MERGE, because there are unresolved bugs, among other reasons. And Paul White seems to recommend separate DML statements as well.

And here's why I added a schema prefix: you should always reference objects by schema, when creating, affecting, etc.

Sql-server – Way to do MERGE with update source when target not match

Updates are tracked by MERGE by providing:

the $Action value UPDATE;
the old values in the memory table deleted; and
the new values in the memory table inserted.

Therefore both in the case where a match is found, and in the case where there is no match by target, the values of the identity column is available in the inserted memory table.

Best Answer

Related Solutions

Sql-server – How to delete only related records in a multi-key MERGE in SQL Server

Sql-server – Way to do MERGE with update source when target not match

Related Question