Sql-server – An efficient way to compare two large data sets in SQL

exceptperformancequery-performancesql serversql-server-2008-r2

Currently, I'm comparing two data sets, that contain unique StoreKey/ProductKey combinations.

The 1st data set has the unique StoreKey/ProductKey combinations for sales between begin January 2012 and end May 2014 (result = 450K lines). The 2nd data set has the unique StoreKey/ProductKey combinations, for sales begin June 2014, until today (result = 190K lines).

I'm looking to find the StoreKey/ProductKey combinations that are in the 2nd set, but not in the 1st set – i.e. new products sold from the beginning of June.

Up until now, I've dumped the two data sets into temp tables, created indexes for both tables on both keys, and used the EXCEPT statement to find unique items.

What is the most efficient way of comparing such large data sets?
Is there a more efficient way of doing this type of large comparison?

Best Answer

Using EXCEPT is in my opinion the way to go here, but you might want to reconsider the use of the temporary table. By doing so you are effectively duplicating your data in memory, which will slow you down. If the indexes you need exist on the source tables (as I suspect), just compare the appropriate SELECTS:

SELECT StoreKey,ProductKey FROM table WHERE sales BETWEEN date1 AND date2
EXCEPT
SELECT StoreKey,ProductKey FROM table WHERE sales BETWEEN date3 AND date4

Related Solutions

Sql-server – Is outputting to both a result set and a temporary table possible

you can alter the column in select or insert list as you need.

    alter procedure GetFoos as
    BEGIN
        CREATE TABLE #t (FooId INT, BarId INT, FooData SYSNAME)

        select FooId, BarId, FooData 
        OUTPUT INSERTED.FooId, INSERTED.BarId,INSERTED.FooData INTO #t(FooId, BarId, FooData )
        from Foo
        where FooId in (select FooId from FooQueue where IsQueued = 1)

        update BarQueue 
        set IsQueued = 1 
        where BarId in (select BarId from #t)

        exec GetBars

        delete FooQueue where IsQueued = 1
    end
    GO

Sql-server – Cannot set NOCOUNT to OFF inside the trigger execution because the server option “disallow_results_from_triggers” is true

Find the triggers that have SET NOCOUNT OFF and fix them. I can't possibly imagine what good can come from having that line anywhere. NOCOUNT should be ON in every single module on your server.

This will give you the definition (and may introduce some false positives, so check closely), you'll just need to add GO between them and change CREATE to ALTER:

SELECT 'GO
-- ' + QUOTENAME(s.name) + '.' + QUOTENAME(m.name) + '
GO',
  m.definition
  FROM sys.triggers AS t
  INNER JOIN sys.schemas AS s
  ON t.[schema_id] = s.[schema_id]
  INNER JOIN sys.sql_modules AS m
  ON t.[object_id] = m.[object_id]
  WHERE LOWER(m.definition) LIKE N'%set%nocount%off';

You may need to repeat this task in multiple databases.

Once you've fixed all of those, try the job again. If the error continues happening, try to narrow down the trigger(s) responsible and inspect the definition more closely. You may have to reverse engineer the job and all of the things it calls in order to find it, but once you have, post it here.

Best Answer

Related Solutions

Sql-server – Is outputting to both a result set and a temporary table possible

Sql-server – Cannot set NOCOUNT to OFF inside the trigger execution because the server option “disallow_results_from_triggers” is true

Related Question