SQL Server Performance – Impact of Updating Many Columns in a MERGE Statement

mergesql-server-2012

I have a merge statement that either needs to insert about 65 rows worth of data, or update rows. At the moment I'm updating all columns when a match is found. But I don't need to.

Would there be a significant performance increase if i only updated a few columns on WHEN MATCHED THEN UPDATE vs updating all columns?

Best Answer

My first thought was: no, the whole row will get re-written whether you update one column or many, in both the log and data files. This should be true for simple un-indexed columns at least but there could be complications.

It may make a difference if the "extra" columns are indexed: will the engine be bright enough to know it doesn't need to make changes for what is effectively no change? Of course if the update results in the row moving to a different page because it is variable length and the update causes it to no longer fit forcing a page split, then all the indexes will need updating anyway. Another complication might be your own code in triggers if your database engine supports identifying which columns were altered (UPDATE() and COLUMNS_UPDATED() in SQL Server): if your code is dependent on that does the engine identify which columns really changed or just list those that were written to?

This will all be dependent on which database engine you are using, so you will need to specify that in your question text and/or tags to get less generic advice.

Related Solutions

Sql-server – What are the locking options for a Merge statement

You need to give the query processor a more efficient access path to locate StudentTotalMarks records. As written, the query requires a full scan of the table with a residual predicate [StudentID] = [@StudentId] applied to each row:

Scan Plan

The engine takes U (update) locks when reading as a basic defence against a common cause of conversion deadlocks. This behaviour means the second execution blocks when trying to obtain a U lock on the row already locked with an X (exclusive) lock by the first execution.

The following index provides a better access path, avoiding taking unnecessary U locks:

CREATE UNIQUE INDEX uq1 
ON dbo.StudentTotalMarks (StudentID) 
INCLUDE (StudentMarks);

The query plan now includes a seek operation on StudentID = [@StudentId], so U locks are only requested on target rows:

Seek Plan

The index is not required to be UNIQUE to solve the issue at hand (though the INCLUDE is required to make it a covering index for this query).

Making StudentID the PRIMARY KEY of the StudentTotalMarks table would also solve the access path problem (and the apparently redundant Id column could be removed). You should always enforce alternate keys with a UNIQUE or PRIMARY KEY constraint (and avoid adding meaningless surrogate keys without good reason).

Sql-server – How to delete only related records in a multi-key MERGE in SQL Server

This is the separate DELETE operation I had in mind:

DELETE m
FROM dbo.Mapping AS m
WHERE EXISTS 
  (SELECT 1 FROM @Values WHERE LeftID = m.LeftID)
AND NOT EXISTS 
  (SELECT 1 FROM @Values WHERE LeftID = m.LeftID AND RightID = m.RightID);

As I outline here, for a left anti-semi join, the NOT EXISTS pattern will often outperform the LEFT JOIN / NULL pattern (but you should always test).

Not sure if your overall goal is clarity or performance, so only you can judge if this will work out better for your requirements than the NOT MATCHED BY source option. You'll have to look at the plans qualitatively, and the plans and/or runtime metrics quantitatively, to know for sure.

If you expect your MERGE command to protect you from race conditions that would happen with multiple independent statements, you better make sure that is true by changing it to:

MERGE dbo.Mapping WITH (HOLDLOCK) AS target

(From Dan Guzman's blog post.)

Personally, I would do all of this without MERGE, because there are unresolved bugs, among other reasons. And Paul White seems to recommend separate DML statements as well.

And here's why I added a schema prefix: you should always reference objects by schema, when creating, affecting, etc.

Best Answer

Related Solutions

Sql-server – What are the locking options for a Merge statement

Sql-server – How to delete only related records in a multi-key MERGE in SQL Server

Related Question