In MySQL there is a multi-table DELETE syntax. Your first DELETE will delete rows only from the players
TABLE. If you want to delete from multiple tables you have to use something like:
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
This doesn't address the problem with the long-running DELETE statement though. In fact above query should take even more time, because now it actually would delete rows from stats
and photos
tables. The workaround you could use is to split the large DELETE into smaller ones. Since you have a nice WHERE condition, you could manually split the deletes on that (for example one DELETE for each ten years of players.born_on
) and run them in ascending order, that is:
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1930-01-01";
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1940-01-01";
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01";
It this is too coarse (i.e. it takes too long do execute each query) you should make the WHERE conditions even more fine-grained (perhaps delete one year each chunk).
Also there is a --purge
option for pt-archiver from Percona Toolkit which would split the data to be deleted in chunks automatically, but it doesn't seem to support the multi table case. See example usage of pt-archiver in this presentation
From your table definitions I see that you don't have an index on players.birth_date
(I suppose that this is the column you relate to as born_at
in your example queries). This makes the decade chunks approach useless, since every query would have to scan all players
table.
If you can't afford having a long table lock for the DELETE to finish, you most likely can't afford to create an index on the birth_date
column as well.
You could split the data on another column, PRIMARY KEY is a good bet. You can write a script which would process all the players in chunks of 10000 (or less or more, depending on the length of a single DELETE statement):
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
AND "players".id BETWEEN n*10000+1 AND (n+1)*10000
Where n would be a parameter ranging from 0 to MAX(players.id)/10000 . This way you will avoid a full table scan (which certainly is painful for a 100M table)
You could also try to estimate the DELETE complexity with an EXPLAIN SELECT instead of DELETE:
EXPLAIN SELECT *
FROM "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
AND "players".id BETWEEN 1 AND 10000
Best Answer
No other clean up should be required. If you are really worried, you can watch your database size with:
To make sure it shrinks by about the amount you expected. It could take until the end of the next checkpoint to fully clean up.
If you were dropping tens of thousands of small tables is where you have you to worry about catalog bloat. One large table is not an issue.