I have a table named users
having approx 70 Million records, out of which I guess 10 Million is duplicate. How to remove those 10 Million records keeping one copy of that.
Two records are said duplicate if both have same email, alt_email, mobile
and alt_mobile
.
I searched on stackexchange, but couldn't found any solution effective with such a huge data. Any help would be appreciated. My table is in MySQL 5.5.45 with InnoDB engine.
This is the users table. Apart from mentioned columns, I have few more columns but the mentioned ones are relevant to the question.
CREATE TABLE users
(
id int(11) NOT NULL AUTO_INCREMENT,
email varchar(200) DEFAULT NULL,
alt_email varchar(200) DEFAULT NULL,
mobile varchar(100) DEFAULT NULL,
alt_mobile varchar(100) DEFAULT NULL,
PRIMARY KEY (id),
KEY mobile (mobile),
KEY email (email)
) ENGINE=InnoDB;
Best Answer
Plan A:
This will go through the table, copy it over, ignoring any dups. I am, however, hesitant to say that it will properly handle NULLs in those columns.
Plan B:
Plan C: Do some other query, but use
<=>
instead of=
when comparing potentially NULL columns. Documentation.