Sql-server – Identifying which values do NOT match a table row

exceptsql serversql-server-2005

I would like to be able to easily check which unique identifiers do not exist in a table, of those supplied in a query.

To better explain, here's what I would do now, to check which IDs of the list "1, 2, 3, 4" do not exist in a table:

SELECT * FROM dbo."TABLE" WHERE "ID" IN ('1','2','3','4'), let's say the table contains no row with ID 2.
Dump the results into Excel
Run a VLOOKUP on the original list that searches for each list value in the result list.
Any VLOOKUP that results in an #N/A is on a value that did not occur in the table.

I'm thinking there's got to be a better way to do this. I'm looking, ideally, for something like

List to check -> Query on table to check -> Members of list not in table

Best Answer

Use EXCEPT:

SELECT * FROM
  (values (1),(2),(3),(4)) as T(ID)
EXCEPT
SELECT ID 
FROM [TABLE];

See SqlFiddle.

The values constructor will only work on SQL Server 2008 or later. For 2005, use

SELECT 'value'
UNION SELECT 'value'

as detailed in this SO answer.

Related Solutions

Sql-server – Is it better to delete then insert, or to update then insert in SQL server

UPDATE then INSERT usually. Simply, it's less work.

In this case, you have an ID columns (IDENTITY): I'll assume this is the clustered index

You delete rows, you leave gaps in pages = fragmentation. You add rows, you probably need more pages allocated. Other processes are doing this too.

An UPDATE will update in-situ and you'll have a less expensive INSERT because there is less rows.

Saying that...

If your new:update is 100:1 then it doesn't really matter of course. And the EXISTS is required too.

However, from a raw "shifting data" perspective UPDATE..INSERT would be my choice

Sql-server – Identifying rows which don’t match a master row

You can also do this with dynamic SQL without having to manually build out all the column names.

DECLARE @sql NVARCHAR(MAX), @c1 NVARCHAR(MAX), @c2 NVARCHAR(MAX);

SELECT @c1 = N'', @c2 = N'';

SELECT 
  @c1 = @c1 + ',' + QUOTENAME(name),
  @c2 = @c2 + ' AND m.' + QUOTENAME(name) + ' = s.' + QUOTENAME(name)
 FROM sys.columns
 WHERE name <> 'LocationID'
 AND [object_id] = OBJECT_ID('dbo.table1');

SET @sql = ';WITH s AS (
       SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1
       EXCEPT 
       SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1_master
     ) 
     SELECT m.LocationID
 FROM s INNER JOIN dbo.table1 AS m ON 1 = 1
 ' + @c2;

SELECT @sql;
--EXEC sp_executesql @sql;

You can take the output of this query as is and store the query somewhere, or you can comment out the SELECT and uncomment the EXEC and leave it as permanent dynamic SQL - in this case it will automatically adapt to column changes in the two tables.

Another idea (assuming LocationID is unique) - and it occurred to me you may want to include the master row so you can quickly spot the columns that are different:

  ;WITH c AS 
  (
    SELECT t.LocationID, m.setting1, m.setting2, ...
      FROM dbo.table1 AS t CROSS JOIN dbo.table1_master AS m
  )
  SELECT DISTINCT src = '> master', setting1, setting2, ...
    FROM c
  UNION ALL
  (
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM dbo.table1
    EXCEPT
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM c
  )
  ORDER BY src;

This version is a little cheaper (mostly by avoiding the DISTINCT against the master table, at the cost of needing to specify all of the columns one more time - which again you can automate as per above):

  ;WITH m AS 
  (
    SELECT setting1, setting2, ... 
      FROM dbo.table1_master
  ),
  c AS 
  (
    SELECT src = RTRIM(t.LocationID), m.setting1, m.setting2, ...
      FROM dbo.table1 AS t CROSS JOIN m
  )
  SELECT src = '> master', setting1, setting2, ...
    FROM m
  UNION ALL
  (
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM dbo.table1
    EXCEPT
    SELECT src, setting1, setting2, ...
      FROM c
  )
  ORDER BY src;

However all of these options are poorer performers with worse plans than Rachel's simple LEFT JOIN. I tried to stick to the theme of using EXCEPT even though it is more about syntax than performance.

The key takeaway is that if the column count is too high to deal with manually, you can use the dynamic SQL approach above to construct whatever query you want to use - and you can do that one time and store the result, or have the code generated every time. To generate Rachel's query using dynamic SQL, not much needs to change:

DECLARE @sql NVARCHAR(MAX), @and NVARCHAR(MAX), @anycol NVARCHAR(128);
SELECT @sql = N'', @and = N'';

SELECT @and = @and + ' AND t.' + QUOTENAME(name) + ' = m.' + QUOTENAME(name)
  FROM sys.columns
  WHERE [object_id] = OBJECT_ID('dbo.table1_master');

SELECT TOP (1) @anycol = QUOTENAME(name)
  FROM sys.columns
  WHERE [object_id] = OBJECT_ID('dbo.table1_master')
  ORDER BY name;

SET @sql = 'SELECT locationID
FROM dbo.table1 AS t
LEFT OUTER JOIN dbo.table1_master AS m ON 1 = 1' 
  + @and + ' WHERE m.' + @anycol + ' IS NULL;';

SELECT @sql;
--EXEC sp_executesql @sql;

Best Answer

Related Solutions

Sql-server – Is it better to delete then insert, or to update then insert in SQL server

Sql-server – Identifying rows which don’t match a master row

Related Question