Referring to this link: Optimizing MERGE Statement Performance
The following criteria are given for ideal performance:
- Create an index on the join columns in the source table that is unique and covering.
- Create a unique clustered index on the join columns in the target table.
I have many large tables (100+ million rows) that I merge 1-5 million rows into on a regular basis. There is a unique clustered index on the target table which matches the keys on the source table, satisfying the second criteria.
There is a bulk import process that populates the source table over time. I then place a non-unique clustered index on it in preparation for merging – this is because there is no way to ensure that duplicate values will not be inserted during the bulk process. This only sorta-half satisfies the first criteria.
I use a CTE with deduplication logic as the source for my merge –
...
USING
( SELECT
cte.Key1,
cte.Key2,
cte.Key3,
cte.RestOfTheColumns
FROM
( SELECT
Key1,
Key2,
Key3,
RestOfTheColumns,
,ROW_NUMBER() OVER (PARTITION BY Key1, Key2, Key3 ORDER BY Key1, Key2, Key3) AS RowNumber
) cte
WHERE cte.RowNumber = 1
)
...
Ideally, I believe, this would already exist in a uniquely indexed table.
In summary:
- I put a clustered index on the source then deduplicate it in-query.
- In the query analyzer the Clustered Index Merge takes 91.9% of the plan cost.
Does it matter that the CTE/underlying table does not have an explicit unique constraint? There are no duplicates, but SQL doesn't know that, so I imagine it would check for every line in the upsert.
Best Answer
If you have a clustered (or covering) index starting with (Key1, Key2, Key3, ...) on the table that the CTE is querying, this should be a well-performing query.
Can you add a "Key4", preferably an identity column, in the source table to your non-unique clustered index, so you perhaps can make the index unique? That way, you could also set "Key4" as your
ORDER BY
in theROW_NUMBER()
. This might improve performance, but it's just a guess.The fact that the Clustered Index Merge operator is over 90% of your plan doesn't by itself indicate a performance problem, it just says that there's not that much else to do in the query plan apart from the join. In fact, it may indicate that you have a well-optimized plan - when you have the proper indexes, a
ROW_NUMBER()
can be very efficient.