SQL Server 2012 – Most Efficient Way to Compare Two Large Result Sets

exceptsql serversql-server-2012

Current advice for the most efficient way to compare two large result/row sets seems to be to use the EXCEPT operator. This self contained SQL script below gets very inefficient as row sizes increase (change @last values). I have tried to find unique entries in a combined table but with no improvement.

DECLARE @first AS INT, @step AS INT, @last AS INT; 

-- This script is comparing two record sets using EXCEPT
-- I want to find additions from OLD to NEW
-- As number of rows increase performance gets terrible
-- I don't have to use two tables. I could use one combined table but I want the same result as quickly as possible

-- Compare 100 to 110 rows - 0 seconds
-- Compare 1000 to 1010 rows - 1 seconds
-- Compare 10000 to 10010 rows - 16 seconds
-- Compare 100000 to 100010 rows - ABORT after 8 minutes (tables are populated in 18 seconds)

DECLARE @temptableOLD TABLE ([Result1] int);
SET @step = 1;  SET @first = 1; SET @last = 100000
WHILE(@first <= @last) BEGIN INSERT INTO @temptableOLD VALUES(@first) SET @first += @step END

DECLARE @temptableNEW TABLE ([Result1] int);
SET @step = 1;  SET @first = 1; SET @last = 100010
WHILE(@first <= @last) BEGIN INSERT INTO @temptableNEW VALUES(@first) SET @first += @step END

select * from @temptableNEW
except
select * from @temptableOLD

Best Answer

EXCEPT implies a DISTINCT operation.

I would use NOT EXISTS if this is not actually required.

However the problem you are having is likely that you are getting a nested loops on an unindexed table due to the poor cardinality estimates associated with table variables.

select * from @temptableNEW
except
select * from @temptableOLD
OPTION (RECOMPILE)

Will be able to take account that the tables have 100K rows each and give a different plan.

In SQL Server 2012 you can only add indexes to table variables via constraints. If the values are unique you could use

DECLARE @temptableOLD TABLE ([Result1] int UNIQUE CLUSTERED);

to add an index. If done on both tables the plan (after the recompile hint is added) will likely use a merge join instead. Without any indexes I would expect a hash join.