Find rows matching only values from a specific list with SQL

exceptquery

I have a table like this:

  Id    Status
  --    -----------
  1     Closed
  1     In progress
  2     Rejected
  3     Cancelled
  4     Rejected
  4     Closed
  5     In progress

I want to show rows that have the Status of closed, rejected or cancelled. However, I want to exclude the IDs that have a different status value, even if they also have any of those three.

So, for my example I would like the output to be this:

  Id    Status
  --    ---------
  2     Rejected
  3     Cancelled
  4     Rejected
  4     Closed

What should my query look like?

Best Answer

One way would be to use NOT EXISTS.

SELECT *
FROM   YourTable T1
WHERE  Status IN ( 'Closed', 'Rejected', 'Cancelled' )
       AND NOT EXISTS (SELECT *
                       FROM   YourTable T2
                       WHERE  T2.Id = T1.Id
                              AND T2.Status NOT IN ( 'Closed', 'Rejected', 'Cancelled' )) ;

Or an alternative method if your DBMS supports windowed aggregates..

SELECT Id,
       Status
FROM   (SELECT *,
               COUNT(CASE WHEN Status NOT IN ('Closed','Rejected','Cancelled') 
                             THEN 1 END) 
                     OVER (PARTITION BY Id) C
        FROM   YourTable) D
WHERE  Status IN ( 'Closed', 'Rejected', 'Cancelled' )
       AND C = 0 ;

And another option using EXCEPT:

SELECT * 
FROM YourTable 
WHERE Id IN 
      ( SELECT Id FROM YourTable 
        WHERE Status IN ( 'Closed', 'Rejected', 'Cancelled' ) 
      EXCEPT 
        SELECT Id FROM YourTable 
        WHERE Status NOT IN ( 'Closed', 'Rejected', 'Cancelled' )
      ) ;

Related Solutions

Sql-server – Easily show rows that are different between two tables or queries

You don't need 30 join conditions for a FULL OUTER JOIN here.

You can just Full Outer Join on the PK, preserve rows with at least one difference with WHERE EXISTS (SELECT A.* EXCEPT SELECT B.*) and use CROSS APPLY (SELECT A.* UNION ALL SELECT B.*) to unpivot out both sides of the JOINed rows into individual rows.

WITH TableA(Col1, Col2, Col3) 
     AS (SELECT 'Dog',1,1     UNION ALL 
         SELECT 'Cat',27,86   UNION ALL 
         SELECT 'Cat',128,92), 
     TableB(Col1, Col2, Col3) 
     AS (SELECT 'Dog',1,1     UNION ALL 
         SELECT 'Cat',27,105  UNION ALL 
         SELECT 'Lizard',83,NULL) 
SELECT CA.*
FROM   TableA A 
       FULL OUTER JOIN TableB B 
         ON A.Col1 = B.Col1 
            AND A.Col2 = B.Col2 
/*Unpivot the joined rows*/
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
             SELECT 'TableB' AS what, B.*) AS CA     
/*Exclude identical rows*/
WHERE  EXISTS (SELECT A.* 
               EXCEPT 
               SELECT B.*) 
/*Discard NULL extended row*/
AND CA.Col1 IS NOT NULL      
ORDER BY CA.Col1, CA.Col2

Gives

what   Col1   Col2        Col3
------ ------ ----------- -----------
TableA Cat    27          86
TableB Cat    27          105
TableA Cat    128         92
TableB Lizard 83          NULL

Or a version dealing with the moved goalposts.

SELECT DISTINCT CA.*
FROM   TableA A 
       FULL OUTER JOIN TableB B 
         ON EXISTS (SELECT A.*  INTERSECT  SELECT B.*) 
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
             SELECT 'TableB' AS what, B.*) AS CA     
WHERE NOT EXISTS (SELECT A.*  INTERSECT  SELECT B.*) 
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2

For tables with many columns it can still be difficult to identify the specific column(s) that differ. For that you can potentially use the below.

(though just on relatively small tables as otherwise this method likely won't have adequate performance)

SELECT t1.primary_key,
       y1.c,
       y1.v,
       y2.v
FROM   t1
       JOIN t2
         ON t1.primary_key = t2.primary_key
       CROSS APPLY (SELECT t1.*
                    FOR xml path('row'), elements xsinil, type) x1(x)
       CROSS APPLY (SELECT t2.*
                    FOR xml path('row'), elements xsinil, type) x2(x)
       CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
                           n.n.value('.', 'nvarchar(max)')
                    FROM   x1.x.nodes('row/*') AS n(n)) y1(c, v)
       CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
                           n.n.value('.', 'nvarchar(max)')
                    FROM   x2.x.nodes('row/*') AS n(n)) y2(c, v)
WHERE  y1.c = y2.c
       AND EXISTS(SELECT y1.v
                  EXCEPT
                  SELECT y2.v)

Sql-server – Identifying rows which don’t match a master row

You can also do this with dynamic SQL without having to manually build out all the column names.

DECLARE @sql NVARCHAR(MAX), @c1 NVARCHAR(MAX), @c2 NVARCHAR(MAX);

SELECT @c1 = N'', @c2 = N'';

SELECT 
  @c1 = @c1 + ',' + QUOTENAME(name),
  @c2 = @c2 + ' AND m.' + QUOTENAME(name) + ' = s.' + QUOTENAME(name)
 FROM sys.columns
 WHERE name <> 'LocationID'
 AND [object_id] = OBJECT_ID('dbo.table1');

SET @sql = ';WITH s AS (
       SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1
       EXCEPT 
       SELECT ' + STUFF(@c1, 1, 1, '') + ' FROM dbo.table1_master
     ) 
     SELECT m.LocationID
 FROM s INNER JOIN dbo.table1 AS m ON 1 = 1
 ' + @c2;

SELECT @sql;
--EXEC sp_executesql @sql;

You can take the output of this query as is and store the query somewhere, or you can comment out the SELECT and uncomment the EXEC and leave it as permanent dynamic SQL - in this case it will automatically adapt to column changes in the two tables.

Another idea (assuming LocationID is unique) - and it occurred to me you may want to include the master row so you can quickly spot the columns that are different:

  ;WITH c AS 
  (
    SELECT t.LocationID, m.setting1, m.setting2, ...
      FROM dbo.table1 AS t CROSS JOIN dbo.table1_master AS m
  )
  SELECT DISTINCT src = '> master', setting1, setting2, ...
    FROM c
  UNION ALL
  (
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM dbo.table1
    EXCEPT
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM c
  )
  ORDER BY src;

This version is a little cheaper (mostly by avoiding the DISTINCT against the master table, at the cost of needing to specify all of the columns one more time - which again you can automate as per above):

  ;WITH m AS 
  (
    SELECT setting1, setting2, ... 
      FROM dbo.table1_master
  ),
  c AS 
  (
    SELECT src = RTRIM(t.LocationID), m.setting1, m.setting2, ...
      FROM dbo.table1 AS t CROSS JOIN m
  )
  SELECT src = '> master', setting1, setting2, ...
    FROM m
  UNION ALL
  (
    SELECT RTRIM(LocationID), setting1, setting2, ...
      FROM dbo.table1
    EXCEPT
    SELECT src, setting1, setting2, ...
      FROM c
  )
  ORDER BY src;

However all of these options are poorer performers with worse plans than Rachel's simple LEFT JOIN. I tried to stick to the theme of using EXCEPT even though it is more about syntax than performance.

The key takeaway is that if the column count is too high to deal with manually, you can use the dynamic SQL approach above to construct whatever query you want to use - and you can do that one time and store the result, or have the code generated every time. To generate Rachel's query using dynamic SQL, not much needs to change:

DECLARE @sql NVARCHAR(MAX), @and NVARCHAR(MAX), @anycol NVARCHAR(128);
SELECT @sql = N'', @and = N'';

SELECT @and = @and + ' AND t.' + QUOTENAME(name) + ' = m.' + QUOTENAME(name)
  FROM sys.columns
  WHERE [object_id] = OBJECT_ID('dbo.table1_master');

SELECT TOP (1) @anycol = QUOTENAME(name)
  FROM sys.columns
  WHERE [object_id] = OBJECT_ID('dbo.table1_master')
  ORDER BY name;

SET @sql = 'SELECT locationID
FROM dbo.table1 AS t
LEFT OUTER JOIN dbo.table1_master AS m ON 1 = 1' 
  + @and + ' WHERE m.' + @anycol + ' IS NULL;';

SELECT @sql;
--EXEC sp_executesql @sql;

Best Answer

Related Solutions

Sql-server – Easily show rows that are different between two tables or queries

Sql-server – Identifying rows which don’t match a master row

Related Question