You don't need 30 join conditions for a FULL OUTER JOIN
here.
You can just Full Outer Join on the PK, preserve rows with at least one difference with WHERE EXISTS (SELECT A.* EXCEPT SELECT B.*)
and use CROSS APPLY (SELECT A.* UNION ALL SELECT B.*)
to unpivot out both sides of the JOIN
ed rows into individual rows.
WITH TableA(Col1, Col2, Col3)
AS (SELECT 'Dog',1,1 UNION ALL
SELECT 'Cat',27,86 UNION ALL
SELECT 'Cat',128,92),
TableB(Col1, Col2, Col3)
AS (SELECT 'Dog',1,1 UNION ALL
SELECT 'Cat',27,105 UNION ALL
SELECT 'Lizard',83,NULL)
SELECT CA.*
FROM TableA A
FULL OUTER JOIN TableB B
ON A.Col1 = B.Col1
AND A.Col2 = B.Col2
/*Unpivot the joined rows*/
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
SELECT 'TableB' AS what, B.*) AS CA
/*Exclude identical rows*/
WHERE EXISTS (SELECT A.*
EXCEPT
SELECT B.*)
/*Discard NULL extended row*/
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2
Gives
what Col1 Col2 Col3
------ ------ ----------- -----------
TableA Cat 27 86
TableB Cat 27 105
TableA Cat 128 92
TableB Lizard 83 NULL
Or a version dealing with the moved goalposts.
SELECT DISTINCT CA.*
FROM TableA A
FULL OUTER JOIN TableB B
ON EXISTS (SELECT A.* INTERSECT SELECT B.*)
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
SELECT 'TableB' AS what, B.*) AS CA
WHERE NOT EXISTS (SELECT A.* INTERSECT SELECT B.*)
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2
For tables with many columns it can still be difficult to identify the specific column(s) that differ. For that you can potentially use the below.
(though just on relatively small tables as otherwise this method likely won't have adequate performance)
SELECT t1.primary_key,
y1.c,
y1.v,
y2.v
FROM t1
JOIN t2
ON t1.primary_key = t2.primary_key
CROSS APPLY (SELECT t1.*
FOR xml path('row'), elements xsinil, type) x1(x)
CROSS APPLY (SELECT t2.*
FOR xml path('row'), elements xsinil, type) x2(x)
CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
n.n.value('.', 'nvarchar(max)')
FROM x1.x.nodes('row/*') AS n(n)) y1(c, v)
CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
n.n.value('.', 'nvarchar(max)')
FROM x2.x.nodes('row/*') AS n(n)) y2(c, v)
WHERE y1.c = y2.c
AND EXISTS(SELECT y1.v
EXCEPT
SELECT y2.v)
Using loop back linked servers isn't the greatest idea. This can lead to some interesting performance problems once you start doing joins between local tables and "remote" tables as you are going through a linked server so suddenly instead of doing a normal join you may end up transferring the entire table across the linked server (which means possibly reading it off of the disk), stuffing it into tempdb and joining to a hash table which doesn't have any indexes on it.
Don't setup linked servers unless you actually need them. You can get some very powerful servers these days for not much money (I'm currently working on a server which has 32 cores and 256 Gigs of RAM) so scaling a server up isn't all that hard. If the application actually needs to be scaled out across multiple servers you'll need someone who is used to doing very complex tuning to help out to make sure that you aren't shooting yourself in the foot in the process (which isn't hard to do).
Once you do get to needing linked servers, there's going to be so much stuff that needs to be done to move the database and test everything going through and fixing the code isn't going to seem all that bad.
Best Answer
This is referred to as the intersection of sets. One way to perform the query is the use the
INTERSECT
operator:INTERSECT
returns distinct rows that are output by both the left and right inputs. The first part of the query retrieves the artist_id where the genre_id is 1, the second part of the query retrieves the artist_id where the genre_id is 2, and theINTERSECT
operator then compares both lists, returning only rows that match precisely, eliminating duplicates.This is very similar to using an
INNER JOIN
against two common-table-expressions, commonly referred to as CTEs, as in:If you don't like common-table-expressions (CTEs), this looks like: