UPDATE then INSERT usually. Simply, it's less work.
In this case, you have an ID columns (IDENTITY): I'll assume this is the clustered index
You delete rows, you leave gaps in pages = fragmentation. You add rows, you probably need more pages allocated. Other processes are doing this too.
An UPDATE will update in-situ and you'll have a less expensive INSERT because there is less rows.
Saying that...
If your new:update is 100:1 then it doesn't really matter of course. And the EXISTS is required too.
However, from a raw "shifting data" perspective UPDATE..INSERT would be my choice
You can also do this with dynamic SQL without having to manually build out all the column names.
DECLARE @sql NVARCHAR(MAX), @c1 NVARCHAR(MAX), @c2 NVARCHAR(MAX);
SELECT @c1 = N'', @c2 = N'';
SELECT
@c1 = @c1 + ',' + QUOTENAME(name),
@c2 = @c2 + ' AND m.' + QUOTENAME(name) + ' = s.' + QUOTENAME(name)
FROM sys.columns
WHERE name <> 'LocationID'
AND [object_id] = OBJECT_ID('dbo.table1');
SET @sql = ';WITH s AS (
SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1
EXCEPT
SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1_master
)
SELECT m.LocationID
FROM s INNER JOIN dbo.table1 AS m ON 1 = 1
' + @c2;
SELECT @sql;
--EXEC sp_executesql @sql;
You can take the output of this query as is and store the query somewhere, or you can comment out the SELECT
and uncomment the EXEC
and leave it as permanent dynamic SQL - in this case it will automatically adapt to column changes in the two tables.
Another idea (assuming LocationID is unique) - and it occurred to me you may want to include the master row so you can quickly spot the columns that are different:
;WITH c AS
(
SELECT t.LocationID, m.setting1, m.setting2, ...
FROM dbo.table1 AS t CROSS JOIN dbo.table1_master AS m
)
SELECT DISTINCT src = '> master', setting1, setting2, ...
FROM c
UNION ALL
(
SELECT RTRIM(LocationID), setting1, setting2, ...
FROM dbo.table1
EXCEPT
SELECT RTRIM(LocationID), setting1, setting2, ...
FROM c
)
ORDER BY src;
This version is a little cheaper (mostly by avoiding the DISTINCT
against the master table, at the cost of needing to specify all of the columns one more time - which again you can automate as per above):
;WITH m AS
(
SELECT setting1, setting2, ...
FROM dbo.table1_master
),
c AS
(
SELECT src = RTRIM(t.LocationID), m.setting1, m.setting2, ...
FROM dbo.table1 AS t CROSS JOIN m
)
SELECT src = '> master', setting1, setting2, ...
FROM m
UNION ALL
(
SELECT RTRIM(LocationID), setting1, setting2, ...
FROM dbo.table1
EXCEPT
SELECT src, setting1, setting2, ...
FROM c
)
ORDER BY src;
However all of these options are poorer performers with worse plans than Rachel's simple LEFT JOIN
. I tried to stick to the theme of using EXCEPT
even though it is more about syntax than performance.
The key takeaway is that if the column count is too high to deal with manually, you can use the dynamic SQL approach above to construct whatever query you want to use - and you can do that one time and store the result, or have the code generated every time. To generate Rachel's query using dynamic SQL, not much needs to change:
DECLARE @sql NVARCHAR(MAX), @and NVARCHAR(MAX), @anycol NVARCHAR(128);
SELECT @sql = N'', @and = N'';
SELECT @and = @and + ' AND t.' + QUOTENAME(name) + ' = m.' + QUOTENAME(name)
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.table1_master');
SELECT TOP (1) @anycol = QUOTENAME(name)
FROM sys.columns
WHERE [object_id] = OBJECT_ID('dbo.table1_master')
ORDER BY name;
SET @sql = 'SELECT locationID
FROM dbo.table1 AS t
LEFT OUTER JOIN dbo.table1_master AS m ON 1 = 1'
+ @and + ' WHERE m.' + @anycol + ' IS NULL;';
SELECT @sql;
--EXEC sp_executesql @sql;
Best Answer
Use
EXCEPT
:See SqlFiddle.
The
values
constructor will only work on SQL Server 2008 or later. For 2005, useas detailed in this SO answer.