Sql-server – How to find missing data from one table that is a join of two other tables

exceptsql server

I have two tables, stock and location, with a third table that retains the level of stock at each location, stock_loc_info.

Each stock item should have a row for each location.

What SQL query would show that the row B1 is missing from stock_loc_info?

stock.stockcode
---------------
A
B

location.locno
--------------
1
2

stock_loc_info.fkstockcode  stock_loc_info.fklocno
--------------------------  ----------------------
A                           1
A                           2
B                           2

Best Answer

You need all combinations from the two tables, so a CROSS JOIN and then remove the ones in the 3rd table, using either NOT EXISTS or EXCEPT:

SELECT 
    s.stockcode, l.locno
FROM
    dbo.stock AS s
  CROSS JOIN
    dbo.location AS l

EXCEPT

SELECT 
    si.fkstockcode, si.fklocno
FROM 
    dbo.stock_loc_info AS si ;

Related Solutions

Sql-server – Easily show rows that are different between two tables or queries

You don't need 30 join conditions for a FULL OUTER JOIN here.

You can just Full Outer Join on the PK, preserve rows with at least one difference with WHERE EXISTS (SELECT A.* EXCEPT SELECT B.*) and use CROSS APPLY (SELECT A.* UNION ALL SELECT B.*) to unpivot out both sides of the JOINed rows into individual rows.

WITH TableA(Col1, Col2, Col3) 
     AS (SELECT 'Dog',1,1     UNION ALL 
         SELECT 'Cat',27,86   UNION ALL 
         SELECT 'Cat',128,92), 
     TableB(Col1, Col2, Col3) 
     AS (SELECT 'Dog',1,1     UNION ALL 
         SELECT 'Cat',27,105  UNION ALL 
         SELECT 'Lizard',83,NULL) 
SELECT CA.*
FROM   TableA A 
       FULL OUTER JOIN TableB B 
         ON A.Col1 = B.Col1 
            AND A.Col2 = B.Col2 
/*Unpivot the joined rows*/
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
             SELECT 'TableB' AS what, B.*) AS CA     
/*Exclude identical rows*/
WHERE  EXISTS (SELECT A.* 
               EXCEPT 
               SELECT B.*) 
/*Discard NULL extended row*/
AND CA.Col1 IS NOT NULL      
ORDER BY CA.Col1, CA.Col2

Gives

what   Col1   Col2        Col3
------ ------ ----------- -----------
TableA Cat    27          86
TableB Cat    27          105
TableA Cat    128         92
TableB Lizard 83          NULL

Or a version dealing with the moved goalposts.

SELECT DISTINCT CA.*
FROM   TableA A 
       FULL OUTER JOIN TableB B 
         ON EXISTS (SELECT A.*  INTERSECT  SELECT B.*) 
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
             SELECT 'TableB' AS what, B.*) AS CA     
WHERE NOT EXISTS (SELECT A.*  INTERSECT  SELECT B.*) 
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2

For tables with many columns it can still be difficult to identify the specific column(s) that differ. For that you can potentially use the below.

(though just on relatively small tables as otherwise this method likely won't have adequate performance)

SELECT t1.primary_key,
       y1.c,
       y1.v,
       y2.v
FROM   t1
       JOIN t2
         ON t1.primary_key = t2.primary_key
       CROSS APPLY (SELECT t1.*
                    FOR xml path('row'), elements xsinil, type) x1(x)
       CROSS APPLY (SELECT t2.*
                    FOR xml path('row'), elements xsinil, type) x2(x)
       CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
                           n.n.value('.', 'nvarchar(max)')
                    FROM   x1.x.nodes('row/*') AS n(n)) y1(c, v)
       CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
                           n.n.value('.', 'nvarchar(max)')
                    FROM   x2.x.nodes('row/*') AS n(n)) y2(c, v)
WHERE  y1.c = y2.c
       AND EXISTS(SELECT y1.v
                  EXCEPT
                  SELECT y2.v)

Sql-server – Quick way to validate two tables against each other

Here's what I've done before:

(SELECT 'TableA', * FROM TableA
EXCEPT
SELECT 'TableA', * FROM TableB)
UNION ALL
(SELECT 'TableB', * FROM TableB
EXCEPT
SELECT 'TableB', * FROM TableA)

It's worked well enough on tables that are about 1,000,000 rows, but I'm not sure how well that would work on extremely large tables.

Added:

I've run the query against my system which compares two tables with 21 fields of regular types in two different databases attached to the same server running SQL Server 2005. The table has about 3 million rows, and there's about 25000 rows different. The primary key on the table is weird, however, as it's a composite key of 10 fields (it's an audit table).

The execution plans for the queries has a total cost of 184.25879 for UNION and 184.22983 for UNION ALL. The tree cost only differs on the last step before returning rows, the concatenation.

Actually executing either query takes about 42s plus about 3s to actually transmit the rows. The time between the two queries is identical.

Second Addition:

This is actually extremely fast, each one running against 3 million rows in about 2.5s:

SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM TableA

SELECT CHECKSUM_AGG(BINARY_CHECKSUM(*)) FROM TableB

If the results of those don't match, you know the tables are different. However, if the results do match, you're not guaranteed that the tables are identical because of the [highly unlikely] chance of checksum collisions.

I'm not sure how datatype changes between tables would affect this calculation. I would run the query against the system views or information_schema views.

I tried the query against another table with 5 million rows and that one ran in about 5s, so it appears to be largely O(n).

Best Answer

Related Solutions

Sql-server – Easily show rows that are different between two tables or queries

Sql-server – Quick way to validate two tables against each other

Related Question