SQL Server – How to Query Transfers for Single Source to Single Destination

sql server

This should probably be easier than I'm making it, but I'm scratching my head coming up with an efficient way to query the results I want. I'm working with a table that has a source field and a destination field (along with some other unimportant fields). What I'm looking for are unique source-destination pairs, but only ones where exactly one source matches to exactly one destination. The basic rules involved in the selection process are:

  • One source must match to exactly one destination
  • One destination can have only a single source
  • A destination must not be a source for another group
  • There may be more than one record containing the desired source-destination pair (but we only want to return a distinct pair)

So, given this dataset:

Source   Destination
Grp10    Grp20
Grp11    Grp21
Grp11    Grp22
Grp12    Grp23
Grp12    Grp23
Grp13    Grp24
Grp24    Grp25

I would want the following results:

Source   Destination
Grp10    Grp20
Grp12    Grp23

I've gotten something close with a single query, using multiple levels of SELECT DISTINCT sub-queries joining back on the original table, but the performance began to drop off drastically (> 1 second for less than 200 rows).

I'm wondering if it would more appropriate to do grouping on Source, then, on Destination, store those in temporary tables, and join on those to reduce the data to the results I'm looking for? Some feedback would be great in case I'm missing some really trivial way to get the results I'm looking for.

Best Answer

I don't know all of your source data (or why there isn't any type of unique constraint that would prevent full-on duplicates or a source with multiple destinations), but given only the sample data supplied:

;WITH s AS 
(
  -- first let's eliminate duplicates
  SELECT DISTINCT Source, Destination 
    FROM dbo.MyTable
)
SELECT Source, Destination
FROM s
WHERE NOT EXISTS
(
  SELECT 1 FROM s AS d WHERE 

  -- eliminate chains in either direction:
    d.Destination = s.Source OR d.Source = s.Destination

  -- eliminate any source with multiple destinations:
    OR (d.Source = s.Source AND d.Destination <> s.Destination)

  -- eliminate any destination with more than one source
    OR (d.Destination = s.Destination AND d.Source <> s.Source)
);

SQL fiddle demo