In MySQL there is a multi-table DELETE syntax. Your first DELETE will delete rows only from the players
TABLE. If you want to delete from multiple tables you have to use something like:
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
This doesn't address the problem with the long-running DELETE statement though. In fact above query should take even more time, because now it actually would delete rows from stats
and photos
tables. The workaround you could use is to split the large DELETE into smaller ones. Since you have a nice WHERE condition, you could manually split the deletes on that (for example one DELETE for each ten years of players.born_on
) and run them in ascending order, that is:
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1930-01-01";
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1940-01-01";
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01";
It this is too coarse (i.e. it takes too long do execute each query) you should make the WHERE conditions even more fine-grained (perhaps delete one year each chunk).
Also there is a --purge
option for pt-archiver from Percona Toolkit which would split the data to be deleted in chunks automatically, but it doesn't seem to support the multi table case. See example usage of pt-archiver in this presentation
From your table definitions I see that you don't have an index on players.birth_date
(I suppose that this is the column you relate to as born_at
in your example queries). This makes the decade chunks approach useless, since every query would have to scan all players
table.
If you can't afford having a long table lock for the DELETE to finish, you most likely can't afford to create an index on the birth_date
column as well.
You could split the data on another column, PRIMARY KEY is a good bet. You can write a script which would process all the players in chunks of 10000 (or less or more, depending on the length of a single DELETE statement):
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
AND "players".id BETWEEN n*10000+1 AND (n+1)*10000
Where n would be a parameter ranging from 0 to MAX(players.id)/10000 . This way you will avoid a full table scan (which certainly is painful for a 100M table)
You could also try to estimate the DELETE complexity with an EXPLAIN SELECT instead of DELETE:
EXPLAIN SELECT *
FROM "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
AND "players".id BETWEEN 1 AND 10000
Here are some things you should consider:
CONSIDERATION #1
Using DELETE FROM tbl ... LIMIT 5000
is not replication-safe because the 5000 rows being deleted may not be the same order or the same set of 5000 rows on a slave. See MySQL Documentation (Replication and LIMIT). Should a DELETE on a Master get interrupted, it has to rollback. In that event, it could be remotely possible that the number of times the DELETE FROM tbl ... LIMIT 5000
is called may not be the same. Extra calls of the DELETE
on a Slave is not critical. Too few calls could be bad. This should not happen.
SUGGESTION : Make sure the row count for RTesAll
is identical on the Master and all Slaves.
CONSIDERATION #2
Since you are running a Stored Procedure on the Master, the DELETE
command will not begin its execution on a Slave until the completion of the DELETE
on the Master and the DELETE
command is posted in the Master's Binary Logs. With one Master and 4 Slaves (5 DB Servers), there are a certain of calls to DELETE multiplied by 5.
SUGGESTION #1 : You should add set sql_log_bin = 0;
to the Stored Procedure
DROP PROCEDURE IF EXISTS PurgeOlderData;
DELIMITER $$
CREATE PROCEDURE `PurgeOlderData`(In StartDate DATETIME ,In EndDate DATETIME,In NoOfRecordsToDelete BIGINT,In TableName CHAR(50))
BEGIN
SET sql_log_bin = 0;
SET @delete_counter = 0;
SET @table_name = TableName;
SET @number_of_records_to_delete = NoOfRecordsToDelete;
SET @start_date = StartDate;
SET @end_date = EndDate;
WHILE @delete_counter < @number_of_records_to_delete DO
SET @varSQL = CONCAT('DELETE FROM ', @table_name,' WHERE recordDate BETWEEN \'',@start_date ,'\' AND \'', @end_date ,'\' LIMIT 5000;');
PREPARE stmt FROM @varSQL;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
/*SELECT @varSQL;*/
SET @delete_counter = @delete_counter + 5000;
END WHILE;
SET sql_log_bin = 1;
END $$
DELIMITER ;
to prevent all DELETEs from replicating. If you are deleting 611992998 rows, deleting every 5000 rows requires executing DELETE ... LIMIT 5000
122399 times (That's 611992998 / 5000 rounded up to next integer).
This will prevent having to write to the binlogs 122399 times.
SUGGESTION #2 : Run the Stored Procedure on the Master and Slaves at the same time.
- This prevents replication lag
- This gets rid of all the rows at the same time
- This is best done during maintenance cycles or slow read times
CONSIDERATION #3
Make sure autocommit
is enabled on Master and Slaves. Otherwise, all the deletes will pile up in the undo logs inside ibdata1 before either doing all the deletes at once as a single transaction or a huge rollback.
Give it a Try !!!
Best Answer
The limit is only going to help with exactly 100 so it is probably not worth the overhead.
If you put an index on the hash then it should know to stop (and where to start).