Mysql – Deleting 200M+ rows from InnoDB table

deleteinnodbMySQL

Basically, we have a table with the following structure:

CREATE TABLE `reqs` (
    `id` INT(10) NOT NULL AUTO_INCREMENT,
    `reqid` BIGINT(20) NOT NULL,
    `from` BIGINT(20) NOT NULL,
    `to` BIGINT(20) NOT NULL,
    `rdate` DATETIME NOT NULL,
    `state` ENUM('sent','approved','accepted','rejected') NOT NULL,
    `type` INT(11) NOT NULL,
    `subtype` INT(11) NOT NULL,
    `clusterid` INT(11) NOT NULL,
    `update_time` TIMESTAMP NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    PRIMARY KEY (`id`),
    INDEX `to_state_clusterid` (`to`, `state`, `clusterid`),
    INDEX `from_state_clusterid` (`from`, `state`, `clusterid`),
    INDEX `reqid_to` (`reqid`, `to`),
    INDEX `from_rdate_type` (`from`, `rdate`, `type`),
    INDEX `type` (`type`),
    INDEX `from_state_type_clusterid` (`from`, `state`, `type`, `clusterid`),
    INDEX `to_state_type_clusterid` (`to`, `state`, `type`, `clusterid`),
    INDEX `state` (`state`),
    INDEX `from_to_state_type_clusterid` (`from`, `to`, `state`, `type`, `clusterid`),
    INDEX `from_rdate_type_clusterid` (`from`, `rdate`, `type`, `clusterid`)
)
COLLATE='utf8_general_ci'
ENGINE=InnoDB

It's currently around ~250M rows and we'd like to delete 200M+ of them. This is a production database and this table in particular gets dozens of INSERT/UPDATE/SELECT queries every second.

Finally, I don't know if it changes anything, but all SELECTs to this table have USE INDEX statements in them.

What would be the best way to do this with minimal impact to our application?

EDIT: There are 2 DELETE statements to be executed:

DELETE FROM `reqs` WHERE `state` = 'accepted' or `state` = 'rejected';
DELETE FROM `reqs` WHERE `rdate` < '2013-01-01 00:00:00';

These queries are expected to delete ~70M and ~130M rows respectively.

Best Answer

Any data change action of this size if executed at once will cause a significant logging overhead and will require a lot of additional log space. That in turn will slow down the execution causing an even bigger impact on the rest of the system.

The best way to handle this is to break the delete down into smaller chunks. If you delete between 1000 and 10000 rows at a time and then wait for a few seconds the impact on other transactions should be small. Make sure each delete runs in a separate transaction so that all locks are released during the wait time. However, keep in mind that this will stretch out the delete over several hours or even days.

Also, even though it might sound counter intuitive, make sure there is an index supporting each delete so that MySQL does not have to execute a full table scan each time.