Mysql – How to DELETE duplicate entries on a field

MySQLquery

I have a huge table that has 1,990,546 users with numerous data. Sometimes I get email lists and dump them into this table also.

I want to delete users with duplicate email addresses. However, I want to keep users WHERE chips > 0.

This query ran for over an hour before I terminated it:

SELECT id, email  
FROM users 
WHERE id IN 
    ( 
        SELECT a.id 
        FROM users a, users b     
        WHERE (a.email = b.email) 
        AND (a.chips > 0)  
    )
LIMIT 100;

Is there a faster or better way to do this?

Best Answer

I would use a temp table

#
# Collect All IDs Your Intend to Keep
#
CREATE TABLE KeepIDList ENGINE=MyISAM SELECT id FROM users;
CREATE TABLE KeepIDList ENGINE=MyISAM SELECT id FROM users;
INSERT INTO KeepIDList (id) SELECT min_id FROM
(SELECT email,MIN(id) min_id FROM users GROUP BY email) A;
ALTER TABLE KeepIDList ADD PRIMARY KEY (id);
#
# Collect All IDs Your Intend to Delete
#
CREATE TABLE ZapIDList ENGINE=MyISAM SELECT id FROM users;
INSERT INTO ZapIDList (id)
SELECT A.id FROM users A LEFT JOIN KeepIDList B WHERE B.id IS NULL;
ALTER TABLE KeepIDList ADD PRIMARY KEY (id);
#
# From Collected Keys to Delete, Remove Keys Whose chips > 0
#
DELETE B.* FROM users A INNER JOIN ZapIDList B WHERE A.chips > 0;
#
# From Collected Keys Left to Delete, Perform DELETE JOIN
#
DELETE A.* FROM users A INNER JOIN ZapIDList B;
#
# Drop Temp Tables
#
DROP TABLE KeepIDList;
DROP TABLE ZapIDList;

I have a very good reason for using temp tables as opposed to doing a Direct DELETE JOIN of a table against itself: Performing a DELETE using a subquery that references the same table experiencing the DELETE tends to make keys disappear intermittently as the query is being optimized. I wrote about this back on Feb 22, 2011 : Problem with MySQL subquery

CAVEAT

Please test this out with a smaller dataset to make sure my answer indeed performs what you asked. Make sure you make s backup of the users table. To coin a phrase from 1960's Mission Impossible: Should any of your data be caught or killed, I will disavow any knowledge of your actions (Start From 1:50 of the YouTube Video).