SQL Server Query – Select Rows with Multiple Required Filters

sql serversql-server-2008

How can I select artists who have punk and rock in their musical genres? I need artists with both genres.

A example:

SELECT * FROM artist_genre WHERE genre_id IN (1,2)

In this way I recover all that belong to punk, or rock. But I only need artists who own both. Is there a easy way for this?

If a change in the structure of the databases is required, I'm up to do.

Best Answer

This is referred to as the intersection of sets. One way to perform the query is the use the INTERSECT operator:

SELECT ag.artist_id 
FROM dbo.artists_genres ag
WHERE ag.genre_id = 1
INTERSECT
SELECT ag.artist_id 
FROM dbo.artists_genres ag
WHERE ag.genre_id = 2

INTERSECT returns distinct rows that are output by both the left and right inputs. The first part of the query retrieves the artist_id where the genre_id is 1, the second part of the query retrieves the artist_id where the genre_id is 2, and the INTERSECT operator then compares both lists, returning only rows that match precisely, eliminating duplicates.

This is very similar to using an INNER JOIN against two common-table-expressions, commonly referred to as CTEs, as in:

;WITH g1 AS
(
    SELECT *
    FROM dbo.artists_genres ag1
    WHERE ag1.genre_id = 1
), g2 AS
(
    SELECT *
    FROM dbo.artists_genres ag1
    WHERE ag1.genre_id = 2
)
SELECT g1.artist_id
FROM g1
    INNER JOIN g2 ON g1.artist_id = g2.artist_id;

If you don't like common-table-expressions (CTEs), this looks like:

SELECT g1.artist_id
FROM
    (
        SELECT *
        FROM dbo.artists_genres ag1
        WHERE ag1.genre_id = 1
    ) g1
    INNER JOIN
        (
            SELECT *
            FROM dbo.artists_genres ag1
            WHERE ag1.genre_id = 2
        ) g2 ON g1.artist_id = g2.artist_id

Related Solutions

SQL Server Row Differences – How to Show Rows Different Between Two Tables or Queries

You don't need 30 join conditions for a FULL OUTER JOIN here.

You can just Full Outer Join on the PK, preserve rows with at least one difference with WHERE EXISTS (SELECT A.* EXCEPT SELECT B.*) and use CROSS APPLY (SELECT A.* UNION ALL SELECT B.*) to unpivot out both sides of the JOINed rows into individual rows.

WITH TableA(Col1, Col2, Col3) 
     AS (SELECT 'Dog',1,1     UNION ALL 
         SELECT 'Cat',27,86   UNION ALL 
         SELECT 'Cat',128,92), 
     TableB(Col1, Col2, Col3) 
     AS (SELECT 'Dog',1,1     UNION ALL 
         SELECT 'Cat',27,105  UNION ALL 
         SELECT 'Lizard',83,NULL) 
SELECT CA.*
FROM   TableA A 
       FULL OUTER JOIN TableB B 
         ON A.Col1 = B.Col1 
            AND A.Col2 = B.Col2 
/*Unpivot the joined rows*/
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
             SELECT 'TableB' AS what, B.*) AS CA     
/*Exclude identical rows*/
WHERE  EXISTS (SELECT A.* 
               EXCEPT 
               SELECT B.*) 
/*Discard NULL extended row*/
AND CA.Col1 IS NOT NULL      
ORDER BY CA.Col1, CA.Col2

Gives

what   Col1   Col2        Col3
------ ------ ----------- -----------
TableA Cat    27          86
TableB Cat    27          105
TableA Cat    128         92
TableB Lizard 83          NULL

Or a version dealing with the moved goalposts.

SELECT DISTINCT CA.*
FROM   TableA A 
       FULL OUTER JOIN TableB B 
         ON EXISTS (SELECT A.*  INTERSECT  SELECT B.*) 
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
             SELECT 'TableB' AS what, B.*) AS CA     
WHERE NOT EXISTS (SELECT A.*  INTERSECT  SELECT B.*) 
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2

For tables with many columns it can still be difficult to identify the specific column(s) that differ. For that you can potentially use the below.

(though just on relatively small tables as otherwise this method likely won't have adequate performance)

SELECT t1.primary_key,
       y1.c,
       y1.v,
       y2.v
FROM   t1
       JOIN t2
         ON t1.primary_key = t2.primary_key
       CROSS APPLY (SELECT t1.*
                    FOR xml path('row'), elements xsinil, type) x1(x)
       CROSS APPLY (SELECT t2.*
                    FOR xml path('row'), elements xsinil, type) x2(x)
       CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
                           n.n.value('.', 'nvarchar(max)')
                    FROM   x1.x.nodes('row/*') AS n(n)) y1(c, v)
       CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
                           n.n.value('.', 'nvarchar(max)')
                    FROM   x2.x.nodes('row/*') AS n(n)) y2(c, v)
WHERE  y1.c = y2.c
       AND EXISTS(SELECT y1.v
                  EXCEPT
                  SELECT y2.v)

Sql-server – Microsoft SQL server 2008 and linked servers

Using loop back linked servers isn't the greatest idea. This can lead to some interesting performance problems once you start doing joins between local tables and "remote" tables as you are going through a linked server so suddenly instead of doing a normal join you may end up transferring the entire table across the linked server (which means possibly reading it off of the disk), stuffing it into tempdb and joining to a hash table which doesn't have any indexes on it.

Don't setup linked servers unless you actually need them. You can get some very powerful servers these days for not much money (I'm currently working on a server which has 32 cores and 256 Gigs of RAM) so scaling a server up isn't all that hard. If the application actually needs to be scaled out across multiple servers you'll need someone who is used to doing very complex tuning to help out to make sure that you aren't shooting yourself in the foot in the process (which isn't hard to do).

Once you do get to needing linked servers, there's going to be so much stuff that needs to be done to move the database and test everything going through and fixing the code isn't going to seem all that bad.

Best Answer

Related Solutions

SQL Server Row Differences – How to Show Rows Different Between Two Tables or Queries

Sql-server – Microsoft SQL server 2008 and linked servers

Related Question