We have some huge tables in MySQL database, We have already archived/deleted data older than 2012 in 2013, Now we need to archive/delete data older than 2013 means we need to archive data for year 2012 without application downtime.
Tables Size in GB "TABLE_ROWS" "TABLE_ROWS BEFORE 2012"
RTesAll 923.65 1982098430 611992998
RTest 32.1 205527090
RAdT 6.97 25324446
RAdv 4.37 28260973
So I need to delete 611992998 records from biggets table.
We have One MySQL Master and 4 MySQL Slaves, We need to delete data from all the servers, What I am thinking is i will delete data in chunks from master so that slaves also didn't lag too much.For that i have created a procedure here is the procedure, I have not yet tested it
DROP PROCEDURE IF EXISTS PurgeOlderData;
DELIMITER $$
CREATE PROCEDURE `PurgeOlderData`(In StartDate DATETIME ,In EndDate DATETIME,In NoOfRecordsToDelete BIGINT,In TableName CHAR(50))
BEGIN
SET @delete_counter = 0;
SET @table_name = TableName;
SET @number_of_records_to_delete = NoOfRecordsToDelete;
SET @start_date = StartDate;
SET @end_date = EndDate;
WHILE @delete_counter < @number_of_records_to_delete DO
SET @varSQL = CONCAT('DELETE FROM ', @table_name,' WHERE recordDate BETWEEN \'',@start_date ,'\' AND \'', @end_date ,'\' LIMIT 5000;');
PREPARE stmt FROM @varSQL;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
/*SELECT @varSQL;*/
SET @delete_counter = @delete_counter + 5000;
END WHILE;
END $$
DELIMITER ;
I have used variable @number_of_records_to_delete
in procedure because,I can pass no of records which i target to delete.
Sample call statement
CALL PurgeOlderData('2012-01-01 00:00:00','2012-01-05 00:00:00',100000,'RTestAll');
I can increase the value of @number_of_records_to_delete
accordingly.
Why DELETE with LIMIT 5000 :
I will first test how much time DELETE with LIMIT 5000 takes if it is Ok (Means executes in 1-3 seconds and Slaves are also Ok), I can increase DELETE LIMIT to 10000, If it is also Ok , I can increase some more.
Table Structure of huge table
CREATE TABLE `RTesAll` (
`recordDate` datetime NOT NULL DEFAULT '0000-00-00 00:00:00',
`networkId` bigint(11) NOT NULL DEFAULT '0',
`publisherId` bigint(11) NOT NULL DEFAULT '0',
`feedId` bigint(11) NOT NULL DEFAULT '0',
`subPublisherId` bigint(11) NOT NULL DEFAULT '0',
`subId` varchar(100) NOT NULL DEFAULT '',
`searches` bigint(20) DEFAULT NULL,
`matches` bigint(20) DEFAULT NULL,
PRIMARY KEY (`recordDate`,`networkId`,`publisherId`,`feedId`,`subPublisherId`,`subId`),
KEY `K_networkId` (`networkId`),
KEY `K_publisherId` (`publisherId`),
KEY `K_feedId` (`feedId`),
KEY `K_subPublisherId` (`subPublisherId`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1
So my questions are
-
Does the approach i am going to opt will work and is it optimized way to go. Because we don't want any issues in servers (Like slave lag) due to deletions.
-
What would be the best approach to do it without issues and downtime.
-
How can i use pt-archiver for doing this.
Best Answer
Here are some things you should consider:
CONSIDERATION #1
Using
DELETE FROM tbl ... LIMIT 5000
is not replication-safe because the 5000 rows being deleted may not be the same order or the same set of 5000 rows on a slave. See MySQL Documentation (Replication and LIMIT). Should a DELETE on a Master get interrupted, it has to rollback. In that event, it could be remotely possible that the number of times theDELETE FROM tbl ... LIMIT 5000
is called may not be the same. Extra calls of theDELETE
on a Slave is not critical. Too few calls could be bad. This should not happen.SUGGESTION : Make sure the row count for
RTesAll
is identical on the Master and all Slaves.CONSIDERATION #2
Since you are running a Stored Procedure on the Master, the
DELETE
command will not begin its execution on a Slave until the completion of theDELETE
on the Master and theDELETE
command is posted in the Master's Binary Logs. With one Master and 4 Slaves (5 DB Servers), there are a certain of calls to DELETE multiplied by 5.SUGGESTION #1 : You should add
set sql_log_bin = 0;
to the Stored Procedureto prevent all DELETEs from replicating. If you are deleting 611992998 rows, deleting every 5000 rows requires executing
DELETE ... LIMIT 5000
122399 times (That's 611992998 / 5000 rounded up to next integer).This will prevent having to write to the binlogs 122399 times.
SUGGESTION #2 : Run the Stored Procedure on the Master and Slaves at the same time.
CONSIDERATION #3
Make sure
autocommit
is enabled on Master and Slaves. Otherwise, all the deletes will pile up in the undo logs inside ibdata1 before either doing all the deletes at once as a single transaction or a huge rollback.Give it a Try !!!