I have a table that has a unique ID, name, email, and some other fields. I want to match this table against another table on email, but sometimes emails are shared. The unique ID is not shared among tables, but I will input the ID into the other table when I have identified them. Name is not unique.
I'd like to build two rounds of joins. The first round I would like to match on email and append the ID, and the second round would match on ID and update fields as necessary.
Is there a way to query into the table and grab only those that have unique emails, while ignoring those that share emails? I could then append the ID and re-query those that don't have IDs and join on the name. Would that even work?
Would this be a good use of a partition? I don't understand those well.
Best Answer
I can think of a couple ways to do this:
1) If I restate this, I might say "Get all of the email addresses that have only 1 entry in the table, and exclude email addresses having more than one entry in the table. Use
GROUP BY...HAVING
to identify the non-duplicated email addresses, grouping by the address, and literally looking for aCOUNT
of 1 (or > 1):This would literally give you the list of email addresses, which you would then need to do stuff with.
2) If I restate your question another way, "Get all of the emails where there doesn't exist another row with the same email address. Use 'EXISTS' and 'NOT EXISTS' and a correlated subquery to identify rows with or without duplicates.
Personally, I'd go with the second approach, as I think the logic is easier to follow, and I suspect it would perform better (though, that's just a guess).