SQL Server Merge – How to Update and Insert Rows in SQL Server 2008 R2

mergesql serversql-server-2008-r2upsert

I have a table in the database that I need to update based on the records being passed to it. I am given a table-valued-parameter that has the same schema as the existing table (for all intensive purposes). The requirements are:

  • If the TVP (source) contains a record with existing ID in the existing table (target) update that record, if it is not marked as deleted in the existing table and its modified date is more recent.

    • Conditions: source.Id = target.Id AND target.IsDeleted = 0
      AND source.Modified > target.Modified
  • If the TVP contains a record for which the ID does not exist in the existing table, insert the record to the table.

I have the following set up so far.

EDITED: This setup works to give the desired end result. I, however, don't believe this is the best way of getting the intended result. There are two select statements that calculate the source table for the Merge statement. I would like to avoid having to filter out the source table in this manner.

My question is this: is there a way to achieve the same result as the query below but without having to use two select statements as is currently used in the Merge statement?

IF OBJECT_ID('tempdb..#target') IS NOT NULL DROP TABLE #target    
IF OBJECT_ID('tempdb..#source') IS NOT NULL DROP TABLE #source

CREATE TABLE #target  ( Id int, Modified int, IsDeleted bit, NoteText varchar(500) )    
INSERT #target VALUES (1, 1, 0, '') 
INSERT #target VALUES (2, 1, 1, '') 
INSERT #target VALUES (3, 3, 0, '') 

CREATE TABLE #source ( Id int, Modified int, NoteText varchar(500) )     
INSERT #source VALUES (1, 2, 'Modified') 
INSERT #source VALUES (2, 2, 'Modified')
INSERT #source VALUES (3, 2, 'Modified')
INSERT #source VALUES (4, 2, 'Modified') 

-- ** NEED HELP WITH OPTIMIZATION HERE **
merge #target as t 
using (select * from #source s where s.id not in (select id from #target where IsDeleted = 1 OR Modified > s.Modified)) as source
on t.id=source.id 
when matched then update set Modified=source.Modified, NoteText = source.NoteText 
when not matched then insert values (id, Modified, 0 ,NoteText) ;

SELECT * FROM #target; 

-- Expected records in #target after operations
-- 1,2,0,'Modified'  ** UPDATED **         
-- 2,1,1,''          ** NOT UPDATED BECAUSE ISDELETED = 1 **
-- 3,3,0,''          ** NOT UPDATED BECAUSE TARGET.MODIFIED > SOURCE.MODIFIED **
-- 4,2,0,'Modified'  ** INSERTED **

drop table #target;
drop table #source

Best Answer

A simpler way to write the MERGE is as follows:

MERGE #target /* WITH (SERIALIZABLE) */ AS T
USING #source AS S
    ON S.Id = T.Id
WHEN MATCHED
    AND T.IsDeleted = 0
    AND S.Modified > T.Modified
    THEN UPDATE SET
        Modified = S.Modified,
        NoteText = S.NoteText
WHEN NOT MATCHED THEN
    INSERT (Id, Modified, IsDeleted, NoteText)
    VALUES (S.Id, S.Modified, 0, S.NoteText);

The SERIALIZABLE hint would be necessary to avoid a potential race condition. This consideration obviously does not apply to a local temporary table target, but it would be remiss of me not to mention.

In broad terms, it is almost always wrong to perform any sort of filtering in the ON clause (or equivalently, in a CTE). The ON clause should only be used to specify how rows between the source and target are related.

See Inserting, Updating, and Deleting Data by Using Merge for more details.

The source and target tables should also have a unique index or key. Moreover, for best performance SQL Server should be able to guarantee that the MERGE statement will affect each target row a maximum of once. Without that, the execution plan will have to detect multiple DML operations on the same row in order to raise a runtime error if it happens. This often requires a Sort, and always requires a Segment, Sequence Project, and Assert. None of these will make your command execute any faster :)