MySQL – How to Remove Duplicate Records When Another Column is Empty

MySQL

I have a table with many duplicate records . My table looks like this:

ID |First_Name     | Last_Name     |City
1  |    Alan       | Smith     |Los Angeles
2  |    Alan       | Smith     |
3  |    Alan       | Smith     |New York
4  |    Alan       | Smith     |
5  |    Alan       | Smith     |Houston

I can find duplicate records with this query:

SELECT First_Name, Last_Name, COUNT(*) 
FROM tab_Name 
GROUP BY First_Name, Last_Name 
HAVING COUNT(*) > 1

I'd like to remove duplicate records only under following 2 conditions:

The First_Name and Last_Name are the same
the city column is empty(null).

What's the correct way to achieve this? Please let me know if more explanation is needed. Thank you!

Best Answer

I think this is what you want...

I assume your table is called t:

 CREATE TABLE t
 (
     `ID` int PRIMARY KEY, 
     `First_Name` varchar(50), 
     `Last_Name` varchar(50), 
     `City` varchar(50)
 ) ;

 INSERT INTO t
     (`ID`, `First_Name`, `Last_Name`, `City`)
 VALUES
     (1, 'Alan', 'Smith', 'Los Angeles'),
     (2, 'Alan', 'Smith', NULL),
     (3, 'Alan', 'Smith', 'New York'),
     (4, 'Alan', 'Smith', NULL),
     (5, 'Alan', 'Smith', 'Houston')
 ;

This query DELETEs the rows you don't want...

 DELETE
 FROM
     t
 WHERE
     t.id IN
     (
     SELECT 
         *
     FROM
         (
         SELECT
             id
         FROM
             t AS t1
         WHERE
             City IS NULL
             AND
             EXISTS
             (
                  SELECT 1
                    FROM t AS t2
                   WHERE     t2.`First_Name` = t1.`First_Name`
                         AND t2.`Last_Name`  = t1.`Last_Name`
                         AND t2.`City` IS NOT NULL
                         AND t2.`ID` <> t1.`ID`
             ) 
         ) AS s0 
     ) ; 

 SELECT * FROM t

 ID | First_Name | Last_Name | City

 -: | :--------- | :-------- | :----------
  1 | Alan       | Smith     | Los Angeles
  3 | Alan       | Smith     | New York
  5 | Alan       | Smith     | Houston

dbfiddle here

NOTEs:

The need of a subquery within of a subquery is a limitation of MySQL (most SQL databases don't need it). See You can't specify target table 'NAME' for update in FROM clause
I've assumed that you wouldn't want to delete any row if there is not one with a City. I.e.: if you have (15, 'John', 'Doe', NULL) and (16, 'John', 'Doe', NULL) you leave both on the table.

Related Solutions

Mysql – Remove duplicate terms from column

As has been mentioned, the database is perhaps not the best place to accomplish this; however, it can be done, using a stored function. You wouldn't want to rely on this in a live environment, bit it works.

If the following seems a bit primitive, that's an indication of the somewhat poor match between tool and task.

DELIMITER //

DROP FUNCTION IF EXISTS `dedup_csv` //
CREATE FUNCTION `dedup_csv` (in_str LONGTEXT) RETURNS LONGTEXT
DETERMINISTIC
NO SQL
BEGIN

-- http://dba.stackexchange.com/questions/87144/remove-duplicate-terms-from-column
-- given a comma-separated string of values, return a comma-separated string of
-- unique values found in the list

DECLARE out_str LONGTEXT DEFAULT NULL; -- pending output
DECLARE next_str TEXT DEFAULT NULL;    -- next element under consideration

dedup:
LOOP

  IF CHAR_LENGTH(TRIM(in_str)) = 0 OR in_str IS NULL THEN
    LEAVE dedup; -- no more data to consider
  END IF;

  SET next_str = SUBSTRING_INDEX(in_str,',',1);                   -- find the next element
  SET in_str = SUBSTRING(in_str FROM (CHAR_LENGTH(next_str) + 1 + 1)); -- remove that element

  SET in_str = TRIM(in_str), next_str = TRIM(next_str); -- trim the new and the rest

  IF FIND_IN_SET(next_str,out_str) OR CHAR_LENGTH(next_str) = 0 THEN -- if empty or already found
    ITERATE dedup;
  END IF;

  SET out_str = CONCAT_WS(',',out_str,next_str); -- append the new to pending output 

END LOOP;

RETURN out_str;

END //

DELIMITER ;

See this live demo: http://sqlfiddle.com/#!9/f32ee/1

The output from this function, given the input above:

Lakshmi Bar & Restaurant,Dining & Leisure,Restaurants,Restaurants- Multi Cuisine,Hotel,
East#West,Malad East,Dafftary Road,Mumbai,Maharashtra,Features,Cuisines,Services,Meal For Two,
Last Order/Booking (P.M),BookUrTableId

Leading and trailing spaces and empty values are cleaned up automatically.

mysql> SELECT dedup_csv('foo,bar,,,   ,baz,foo,  foo  , bar') AS deduped;
+-------------+
| deduped     |
+-------------+
| foo,bar,baz |
+-------------+
1 row in set (0.00 sec)

Again, the database not being the ideal tool is reflected in the benchmarks, which are highly dependent on input length and number of elements.

On the short and simple string shown above, the actual runtime on MySQL 5.5 on a 1 GHz Opteron 1218 is ~ 150 µsec, while the long string in the original question requires about 2400 µsec to process... which works out to a fairly substantial amount of time for large data sets.

Mysql – Calculations involving thresholds between two tables

SQL is not the be-all and end-all of programming languages. I would not try to do this in SQL, rather do it in some other language (Perl, PHP, Java, VB, etc).

A minor nit on the indexes of price. Is price_id ever used? Consider DROPping it and turning the UNIQUE KEY into the PRIMARY KEY.

Best Answer

Related Solutions

Mysql – Remove duplicate terms from column

Mysql – Calculations involving thresholds between two tables

Related Question