Mysql – Fastest way to delete matching rows in MySQL

deleteMySQL

I have more than 10 million records in a table. I want to delete the
records where the email column contains .ac..

What is the fastest method to perform the task?

Table Schema

CREATE TABLE `master` (
   `id` int(11) NOT NULL AUTO_INCREMENT,  
   `email` varchar(255) NOT NULL,
   PRIMARY KEY (`id`)
) ENGINE=MyISAM AUTO_INCREMENT=1 DEFAULT CHARSET=latin1;

Delete Query

DELETE FROM `master` WHERE ( email LIKE "%.ac.%" );

Best Answer

You have not given much of a table description. Notwithstanding, I will try to answer your question.

For this example, let's use this table:

CREATE TABLE mytable
(
    id INT NOT NULL AUTO_INCREMENT,
    email VARCHAR(255),
    PRIMARY KEY (id)
);

First, try counting all the IDs that have .ac. in the email column.

SELECT COUNT(1) INTO @CompleteCount FROM mytable;
SELECT COUNT(1) INTO @DeleteCount FROM mytable WHERE LOCATE('.ac.',email) > 0;

There are two approaches to doing a DELETE:

APPROACH #1 : DELETE JOIN

If @DeleteCount < 5% of @CompleteCount, use a list of IDs to DELETE from mytable:

CREATE TABLE mytable_delete_id
SELECT id FROM mytable WHERE LOCATE('.ac.',email) > 0;
ALTER TABLE mytable_delete_id ADD PRIMARY KEY (id);
DELETE B.* FROM mytable_delete_id A INNER JOIN mytable B USING (id);
DROP TABLE mytable_delete_id;

This helps take the guess work out of which rows to DELETE by using a SELECT to get the IDs. The JOIN takes care of the DELETE.

APPROACH #2 : CREATE A NEW TABLE

If you have the following conditions:

  • @DeleteCount > 30% of @CompleteCount
  • your table does not have foreign keys/constraints

you could just create a new table and filter out the rows with .ac. in the email column:

CREATE TABLE mytable_new LIKE mytable;
#insert rows which does not contain '.ac.' from mytable
INSERT INTO mytable_new SELECT * FROM mytable WHERE LOCATE('.ac.',email) = 0;
ALTER TABLE mytable RENAME mytable_old;
ALTER TABLE mytable_new RENAME mytable;
DROP TABLE mytable_old;

EPILOGUE

You can do either of these methods without checking table counta ahead of time. They will both work. If you are concerned that .ac. is in a huge number of rows, go with APPROACH #2. If there are not that many occurrences, go with APPROACH #1.