SQL Server Merge Performance – Troubleshooting Slow MERGE Operations

data-warehousemergereplicationsql serversql-server-2008

I created MERGE statements to transfer differences with a minimal set of required columns from production db into a staging database. The merged tables are planned to be used by different reporting and analysis scenarios (readonly).
The whole process of getting the differences into the staging tables should be fast (at least a lot faster then the current way of getting the whole table contents entirely every day by doing a DELETE followed by INSERT INTO for all data regardless of the amount of indeed changed data that I estimate to be about 5%).

Here is a sample:

SET TRANSACTION ISOLATION LEVEL READ COMMITTED;

SET NOCOUNT ON

IF NOT exists(select 1 from StagingDB.sys.tables t join StagingDB.sys.schemas s on t.schema_id = s.schema_id WHERE t.name like 'A' and s.name like 'staging') 
BEGIN
        SELECT Top 0 [ID],[OID]
        INTO [StagingDB].[staging].[A]
        From [LiveDB].[dbo].[A];
END;

MERGE INTO [StagingDB].[staging].[A] AS Target
USING ( 
    SELECT [ID], [OID] From [LiveDB].[dbo].[A] where X = 0
) AS Source ([ID],[OID])
ON (Target.[ID] = Source.[ID])
WHEN MATCHED AND (Target.[OID] <> Source.[OID]) THEN
    UPDATE SET [OID] = Source.[OID]
WHEN NOT MATCHED BY TARGET  THEN
    INSERT([ID],[OID]) VALUES(Source.[ID],Source.[OID])
WHEN NOT MATCHED BY SOURCE THEN 
    DELETE;


IF NOT EXISTS(
    SELECT * FROM sys.indexes ix Where ix.name = 'PK_A' AND ix.object_id = OBJECT_ID('[StagingDB].[staging].[A]')
)
BEGIN
        ALTER TABLE [StagingDB].[staging].[A] add constraint PK_A primary key CLUSTERED ([ID])
END

This runs VERY QUICKLY for most of my tables. Unfortunately there are two tables that have much more rows and additionaly many more columns of those two tables are being used lateron in the process of reporting, so I have to transfer much more data.

While I can run this kind of merge command for 71 tables in within less than a minute, the two problematic MERGE statements run for about 2 hours each.
I could not yet find out why.

I suspect the amount of date but also the way I compare the tables differences. To find out if I have to run an UPDATE statement I do a comparison for each column that is included in the Merge Statement. So in case I have 10 colums the UPDATE condition of the MERGE looks like that:

WHEN MATCHED AND (Target.[OID] <> Source.[OID] OR Target.[OID1] <> Source.[OID2]  OR Target.[Text1] <> Source.[Text1] OR Target.[varcharlong1] <> Source.[varcharlong1] )  etc...

So it turns out that this kind of update condition check includes all colums and will not have an index. The staging tables always have a clustered primary key in uniqueidentifier and (so far) no additional indexes.

My question is: While the MERGE statement is running very quickly and efficient for most of our tables, it looks like this procedure is not suitable for the two larger tables including multiple columns.
Am I using the MERGE statement correctly in this case using the OR clause to make sure the rows need to be updated or is there a better way?
Is there a better way to quickly get the delta (changes) for a subset of columns into a staging table?
all of my tables do have a ROWVERSION, maybe this could be used to figure out the rows that have been changed somehow?

Best Answer

Don't use merge, it has always had performance problems on large tables. This has been an on going issue since the key word merge was put into SQL Server. See Microsoft Connect and one other site below:

https://connect.microsoft.com/SQLServer/feedback/details/635778/not-matched-and-matched-parts-of-a-sql-merge-statement-are-not-optimized

http://www.sqlservercentral.com/Forums/Topic1465931-391-1.aspx

OK, looking at your code, looks like you're trying to do everything in a single pass (which is often a good idea). In this case, you're using the WHEN NOT MATCHED predicate, which requires a Full Outer Join to include matched rows and unmatched rows in a single pass. The individual DELETE and UPDATE statements do not suffer from this issue, so they actually perform better.

Hope this helps Gianluca