MySQL – How to Remove Duplicate Records When Another Column is Empty

MySQL

I have a table with many duplicate records . My table looks like this:

ID |First_Name     | Last_Name     |City
1  |    Alan       | Smith     |Los Angeles
2  |    Alan       | Smith     |
3  |    Alan       | Smith     |New York
4  |    Alan       | Smith     |
5  |    Alan       | Smith     |Houston 

I can find duplicate records with this query:

SELECT First_Name, Last_Name, COUNT(*) 
FROM tab_Name 
GROUP BY First_Name, Last_Name 
HAVING COUNT(*) > 1 

I'd like to remove duplicate records only under following 2 conditions:

  • The First_Name and Last_Name are the same
  • the city column is empty(null).

What's the correct way to achieve this? Please let me know if more explanation is needed. Thank you!

Best Answer

I think this is what you want...

I assume your table is called t:

 CREATE TABLE t
 (
     `ID` int PRIMARY KEY, 
     `First_Name` varchar(50), 
     `Last_Name` varchar(50), 
     `City` varchar(50)
 ) ;

 INSERT INTO t
     (`ID`, `First_Name`, `Last_Name`, `City`)
 VALUES
     (1, 'Alan', 'Smith', 'Los Angeles'),
     (2, 'Alan', 'Smith', NULL),
     (3, 'Alan', 'Smith', 'New York'),
     (4, 'Alan', 'Smith', NULL),
     (5, 'Alan', 'Smith', 'Houston')
 ;

This query DELETEs the rows you don't want...

 DELETE
 FROM
     t
 WHERE
     t.id IN
     (
     SELECT 
         *
     FROM
         (
         SELECT
             id
         FROM
             t AS t1
         WHERE
             City IS NULL
             AND
             EXISTS
             (
                  SELECT 1
                    FROM t AS t2
                   WHERE     t2.`First_Name` = t1.`First_Name`
                         AND t2.`Last_Name`  = t1.`Last_Name`
                         AND t2.`City` IS NOT NULL
                         AND t2.`ID` <> t1.`ID`
             ) 
         ) AS s0 
     ) ; 

 SELECT * FROM t

 ID | First_Name | Last_Name | City
-: | :--------- | :-------- | :---------- 1 | Alan | Smith | Los Angeles 3 | Alan | Smith | New York 5 | Alan | Smith | Houston

dbfiddle here

--

NOTEs:

  1. The need of a subquery within of a subquery is a limitation of MySQL (most SQL databases don't need it). See You can't specify target table 'NAME' for update in FROM clause

  2. I've assumed that you wouldn't want to delete any row if there is not one with a City. I.e.: if you have (15, 'John', 'Doe', NULL) and (16, 'John', 'Doe', NULL) you leave both on the table.