Mysql – Finding Duplicates in LEFT JOIN

join;MySQL

Here is my table structure:

create table table_a (emp_name varchar(100), dept_name varchar(100));
insert into table_a values ('amar', 'finance');
insert into table_a values ('akbar', 'human_resource');
insert into table_a values ('amar', 'finance');
insert into table_a values ('anthony', 'information_technology');

create table table_b (emp_name varchar(100), dept_name varchar(100));
insert into table_b values ('amar', 'finance');
insert into table_b values ('akbar', 'human_resource');

mysql> select * from table_a as a left join table_b as b on a.emp_name = b.emp_name and a.dept_name = b.dept_name where b.dept_name is null;
+----------+------------------------+----------+-----------+
| emp_name | dept_name              | emp_name | dept_name |
+----------+------------------------+----------+-----------+
| anthony  | information_technology | NULL     | NULL      |
+----------+------------------------+----------+-----------+
1 row in set (0.00 sec)

The left join mentioned above works as expected.

However, what I need is one record for "amar" as well because there are 2 entries for amar in table A and only 1 entry in table B. So, the result should look like this…

+----------+------------------------+----------+-----------+
| emp_name | dept_name              | emp_name | dept_name |
+----------+------------------------+----------+-----------+
| anthony  | information_technology | NULL     | NULL      |
| amar     | finance                | NULL     | NULL      |
+----------+------------------------+----------+-----------+
2 row in set (0.00 sec)

I need this in an environment where count accuracy is more important than data integrity. The table B record number (2) + query results (2) should match with the table A record numbers (4)

Best Answer

This may appear extrememly convoluted by here is the query you need

select * from
(select count(1) dupcount,emp_name,dept_name from table_a group by emp_name,dept_name) as a
left join
(select count(1) dupcount,emp_name,dept_name from table_b group by emp_name,dept_name) as b
on a.emp_name = b.emp_name and a.dept_name = b.dept_name where a.dupcount <> IFNULL(b.dupcount,0);

I loaded you sample data and I got this:

mysql> select * from
    -> (select count(1) dupcount,emp_name,dept_name from table_a group by emp_name,dept_name) as a
    -> left join
    -> (select count(1) dupcount,emp_name,dept_name from table_b group by emp_name,dept_name) as b
    -> on a.emp_name = b.emp_name and a.dept_name = b.dept_name where a.dupcount <> IFNULL(b.dupcount,0);
+----------+----------+------------------------+----------+----------+-----------+
| dupcount | emp_name | dept_name              | dupcount | emp_name | dept_name |
+----------+----------+------------------------+----------+----------+-----------+
|        2 | amar     | finance                |        1 | amar     | finance   |
|        1 | anthony  | information_technology |     NULL | NULL     | NULL      |
+----------+----------+------------------------+----------+----------+-----------+
2 rows in set (0.02 sec)

mysql>

Here is it without the dupcount values

mysql> select a.emp_name,a.dept_name,b.emp_name,b.dept_name from
    -> (select count(1) dupcount,emp_name,dept_name from table_a group by emp_name,dept_name) as a
    -> left join
    -> (select count(1) dupcount,emp_name,dept_name from table_b group by emp_name,dept_name) as b
    -> on a.emp_name = b.emp_name and a.dept_name = b.dept_name where a.dupcount <> IFNULL(b.dupcount,0);
+----------+------------------------+----------+-----------+
| emp_name | dept_name              | emp_name | dept_name |
+----------+------------------------+----------+-----------+
| amar     | finance                | amar     | finance   |
| anthony  | information_technology | NULL     | NULL      |
+----------+------------------------+----------+-----------+
2 rows in set (0.00 sec)

mysql>

Give it a Try !!!