Mysql – Deleting all duplicates

deleteduplicationMySQL

I am trying to delete all the duplicates but keeping single record only (shorter id).
Following query deletes duplicates but take lot of iterations to delete all copies and keeping original ones.

DELETE FROM emailTable WHERE id IN (
 SELECT * FROM (
    SELECT id FROM emailTable GROUP BY email HAVING ( COUNT(email) > 1 )
 ) AS q
)

Its MySQL.

DDL

CREATE TABLE `emailTable` (
 `id` mediumint(9) NOT NULL auto_increment,
 `email` varchar(200) NOT NULL default '',
 PRIMARY KEY  (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=298872 DEFAULT CHARSET=latin1

Best Answer

Try this:

DELETE FROM emailTable WHERE NOT EXISTS (
 SELECT * FROM (
    SELECT MIN(id) minID FROM emailTable    
    GROUP BY email HAVING COUNT(*) > 0
  ) AS q
  WHERE minID=id
)

The above worked for my test of 50 emails (5 different emails duplicated 10 times).

You might need to add an index on the 'email' column:

ALTER TABLE emailTable ADD INDEX ind_email (email);

It might be a bit slow fro 250,000 rows. It was slow for me on a table that had 1.5million rows (properly indexed), which is how I came up with this strategy:

/* CREATE MEMORY TABLE TO HOUSE IDs of the MIN */
CREATE TABLE email_min (minID INT, PRIMARY KEY(minID)) ENGINE=Memory;

/* INSERT THE MINIMUM IDs */
INSERT INTO email_min SELECT id FROM email
    GROUP BY email HAVING MIN(id);

/* MAKE SURE YOU HAVE RIGHT INFO */
SELECT * FROM email 
 WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)

/* DELETE FROM EMAIL */
DELETE FROM email 
 WHERE NOT EXISTS (SELECT * FROM email_min WHERE minID=id)

/* IF ALL IS WELL, DROP MEMORY TABLE */
DROP TABLE email_min;

The benefit of the memory table is there's an index that is used (primary key on minID) that speeds up the process over a normal temporary table.