Db2 – Finding rows that are not duplicate

db2

I have two DB2 tables.

Table 1, called EmployeeFile

EmployeeFile has the following columns: EMP_ID, FNAME, LNAME, DOB

The data looks like this:

   EMP_ID, FNAME, LNAME, DOB
 - GB2342, John,  Smith, 1970-12-10
 - L3243C, Jane,  Jones, 1969-11-09

Table 2, called OriginalData

OriginalData has the following columns: INTERNAL_ID, FNAME, LNAME, DOB.

The data looks like this:

   INTERNAL_ID, FNAME, LNAME,    DOB
 - 6575678588,  Jane,  Jones,    1969-11-09
 - 1232145564,  Rob,   Mitchell, 1968-10-08
 - 3452345345,  John,  Smith,    1970-12-10
 - 6786478568,  John,  Smith,    1970-12-10

I want to show all rows of EmployeeFile for which the (FNAME, LNAME, DOB) combination occurs only once in OriginalData.

For example, if I ran the query on the example dataset shown above, it should return just the following row:

  • Jane, Jones, 1969-11-09

I don't want 'John Smith' because although he exists in EmployeeFile, his (FNAME, LNAME, DOB) combination occurs twice in OriginalData.

I don't want 'Rob Mitchell' because he doesn't exist in EmployeeFile.

Question

How do I do this?

The only way I can think of is immensely ugly: Concatenate (FNAME, LNAME, DOB) (with some delimiter like ~) from both tables, do a join on that column, do a group by having count(that concatenated column) = 1, then do another select on EmployeeFile, selecting only those rows where the (FNAME, LNAME, DOB) match the parsed substrings of that concatenated string!

Best Answer

Pretty straightforward:

select e.* 
from EmployeeFile e
inner join
(
  select FNAME, LNAME, DOB
  from OriginalData
  group by FNAME, LNAME, DOB
  having count(1) = 1
) o
on (e.FNAME, e.LNAME, e.DOB) = (o.FNAME, o.LNAME, o.DOB)