Help with removing duplicate reversed pairs in relational algebra

relational-theory

I have a listing of data that contains duplicate reversed pairs and I need to remove them.. Call them, name1 and name2. Where I have Tom and Mike, and Mike and Tom, such that this single pair is being counted twice.

 name1 | name2
-------|-------
 Tom   | Mike
 Pete  | Jenny
 Bill  | Jenny
 Joe   | Mary
 Mike  | Tom
 Jenny | Pete
 Jenny | Bill
 Mary  | Joe
 Linda | Jenny

The List was a product of an initial match of student with guidance counselor, then the product of the student/counselor and counselor table, which resulted in a longer list and was able to reduce to the pairs above. But, now can't get rid of the duplicates.

While I might have made a mistake in combing the tables that made the pairing, I am stuck with that table for now.. This is a listing of student paired with guidance counselors.

Is there a way to de-dup the list, or do I need to start over?

Best Answer

Do you need to return an existing combination, e.g. if there's only Tom,Mike do you need to return exactly this or is Mike,Tom also ok?

-- order of columns doesn't matter
SELECT DISTINCT
   CASE WHEN name1 > name2 THEN name2 ELSE name1 END as name1,
   CASE WHEN name1 < name2 THEN name2 ELSE name1 END as name2
FROM tab;

-- order of columns is maintained
SELECT DISTINCT name1,name2 -- DISTINCT might not be needed
FROM tab AS t1
WHERE NOT EXISTS(
  SELECT * FROM tab AS t2
  WHERE t1.name1 = t2.name2
    AND t1.name2 = t2.name1
    AND t1.name1 > t2.name1)
;

See fiddle

If you want to delete those rows you might us the 2nd logic, the actual syntax depends on your DBMS:

DELETE --change to SELECT * to see which rows will be deleted
FROM tab AS t1
WHERE EXISTS(
  SELECT * FROM tab AS t2
  WHERE t1.name1 = t2.name2
    AND t1.name2 = t2.name1
    AND t1.name1 > t2.name1)
;