Sql-server – DISTINCT Query on an INNER JOIN

join;sql serversql-server-2008t-sql

I have two tables, one (ARTISTS) with a list of different artists, details and biographies, but no images. I have another table (ARCHIVECATALOGUES), full of auction results, which have the images I need. I want to generate a query which will give me all of the detail from the ARTISTS table, and include an image (does not matter which one) from the other table for each of the artists.

So I need a query which INNER JOINs the two tables. The query below is pulling every result from the ARCHIVESCATALOGUE table, i.e. multiple results per artist. But I only need one result per artist. I used DISTINCT on the artist.ArtistID field, but to no avail. Here is the code:

SELECT DISTINCT artist.ArtistID
    ,ArchiveCatalogues.IMAGER
    ,ArchiveCatalogues.AUCTION
    ,artist.surname
    ,artist.firstnames
    ,artist.dates
    ,artist.honorific
    ,artist.biog
FROM artist
INNER JOIN ArchiveCatalogues ON (ArchiveCatalogues.ARTIST = artist.surname)
    AND (ArchiveCatalogues.FIRSTNAME = artist.firstnames)
WHERE artist.surname >= H *
    AND artist.surname < I
GROUP BY artist.surname
    ,artist.firstnames
    ,artist.dates
    ,artist.honorific
    ,artist.biog
    ,ArchiveCatalogues.AUCTION
    ,artist.ArtistID
    ,ArchiveCatalogues.IMAGER

Perhaps I should be using OUTER JOIN?

Best Answer

First (common misunderstanding), distinct is not applied to individual columns, what you get is distinct rows. This is exactly the same as your GROUP BYdoes, so distinct is redundant.

Second, you need to determine which row among duplicates that you are interested in (or get a random one). You can use window functions to achieve this by enumerating ArchiveCatalogues per Artists:

 SELECT artist.ArtistID
    ,ArchiveCatalogues.IMAGER
    ,ArchiveCatalogues.AUCTION
    ,artist.surname
    ,artist.firstnames
    ,artist.dates
    ,artist.honorific
    ,artist.biog
    , row_number() over (partition by artist.ArtistID
                      -- add wanted order her as:
                      -- order by ...
                        ) as rn
FROM artist
JOIN ArchiveCatalogues 
    ON ArchiveCatalogues.ARTIST = artist.surname
   AND ArchiveCatalogues.FIRSTNAME = artist.firstnames
WHERE artist.surname >= H *
  AND artist.surname < I

Note that I removed the group by since I'm not sure what the purpose was. Now you can select the first row from there:

SELECT ArtistID
     , IMAGER
     , AUCTION
     , surname
     , firstnames
     , dates
     , honorific
     , biog
FROM (
    SELECT artist.ArtistID
        ,ArchiveCatalogues.IMAGER
        ,ArchiveCatalogues.AUCTION
        ,artist.surname
        ,artist.firstnames
        ,artist.dates
        ,artist.honorific
        ,artist.biog
        , row_number() over (partition by artist.ArtistID
                          -- add wanted order her as:
                          -- order by ...
                            ) as rn
    FROM artist
    JOIN ArchiveCatalogues 
        ON ArchiveCatalogues.ARTIST = artist.surname
       AND ArchiveCatalogues.FIRSTNAME = artist.firstnames
    WHERE artist.surname >= H *
      AND artist.surname < I
) as T
WHERE rn = 1;

The join looks strange but it's hard to tell without knowing what the tables and their key's look like.