Mysql – effective way to delete duplicate records based on multiple columns in MySQL

duplicationinnodbMySQLmysql-5.5

I have a table named users having approx 70 Million records, out of which I guess 10 Million is duplicate. How to remove those 10 Million records keeping one copy of that.

Two records are said duplicate if both have same email, alt_email, mobile and alt_mobile.

I searched on stackexchange, but couldn't found any solution effective with such a huge data. Any help would be appreciated. My table is in MySQL 5.5.45 with InnoDB engine.

This is the users table. Apart from mentioned columns, I have few more columns but the mentioned ones are relevant to the question.

CREATE TABLE users
(
  id int(11) NOT NULL AUTO_INCREMENT, 
  email varchar(200) DEFAULT NULL, 
  alt_email varchar(200) DEFAULT NULL, 
  mobile varchar(100) DEFAULT NULL, 
  alt_mobile varchar(100) DEFAULT NULL, 
  PRIMARY KEY (id), 
  KEY mobile (mobile), 
  KEY email (email)
) ENGINE=InnoDB;

Best Answer

Plan A:

ALTER IGNORE TABLE users
    ADD UNIQUE(email, alt_email, mobile and alt_mobile);

This will go through the table, copy it over, ignoring any dups. I am, however, hesitant to say that it will properly handle NULLs in those columns.

Plan B:

CREATE TABLE new LIKE users;
ALTER TABLE new ADD UNIQUE(email, alt_email, mobile and alt_mobile);
INSERT INTO new
    SELECT * FROM users
        GROUP BY (email, alt_email, mobile and alt_mobile);
check the contents of `new` to see if it worked correctly, esp for NULLs
RENAME TABLE users TO old, new to users;
DROP TABLE old;

Plan C: Do some other query, but use <=> instead of = when comparing potentially NULL columns. Documentation.