The rows remain visible. Test (1) shows that the delete
does not prevent any rows from being visible to subsequent queries. Test (2) illustrates the table-lock taken immediately by a normal delete
- the query waits until the delete
finishes and returns a zero count.
testbed and long running query:
create database stack;
use stack;
--
create table my_table (id int auto_increment primary key, varchar_val varchar(10));
insert into my_table (varchar_val)
select 'HELLO'
from (select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) s1,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) s2,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) s3,
(select 0 union all select 1 union all select 3 union all select 4 union all select 5 union all select 6 union all select 6 union all select 7 union all select 8 union all select 9) s4;
--
select avg(t1.id), avg(t2.id), count(*) from my_table t1 cross join my_table t2;
--
test (1): with low_priority:
--session 2:
delete low_priority from stack.my_table;
/*
Query OK, 10000 rows affected (21.84 sec)
*/
--session 3:
select count(*) from stack.my_table;
/*
+----------+
| count(*) |
+----------+
| 10000 |
+----------+
1 row in set (0.00 sec)
*/
test (2): without low_priority
--session 2:
delete low_priority from stack.my_table;
/*
Query OK, 10000 rows affected (21.15 sec)
*/
--session 3:
select count(*) from stack.my_table;
/*
+----------+
| count(*) |
+----------+
| 0 |
+----------+
1 row in set (18.10 sec)
*/
In MySQL there is a multi-table DELETE syntax. Your first DELETE will delete rows only from the players
TABLE. If you want to delete from multiple tables you have to use something like:
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
This doesn't address the problem with the long-running DELETE statement though. In fact above query should take even more time, because now it actually would delete rows from stats
and photos
tables. The workaround you could use is to split the large DELETE into smaller ones. Since you have a nice WHERE condition, you could manually split the deletes on that (for example one DELETE for each ten years of players.born_on
) and run them in ascending order, that is:
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1930-01-01";
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1940-01-01";
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01";
It this is too coarse (i.e. it takes too long do execute each query) you should make the WHERE conditions even more fine-grained (perhaps delete one year each chunk).
Also there is a --purge
option for pt-archiver from Percona Toolkit which would split the data to be deleted in chunks automatically, but it doesn't seem to support the multi table case. See example usage of pt-archiver in this presentation
From your table definitions I see that you don't have an index on players.birth_date
(I suppose that this is the column you relate to as born_at
in your example queries). This makes the decade chunks approach useless, since every query would have to scan all players
table.
If you can't afford having a long table lock for the DELETE to finish, you most likely can't afford to create an index on the birth_date
column as well.
You could split the data on another column, PRIMARY KEY is a good bet. You can write a script which would process all the players in chunks of 10000 (or less or more, depending on the length of a single DELETE statement):
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
AND "players".id BETWEEN n*10000+1 AND (n+1)*10000
Where n would be a parameter ranging from 0 to MAX(players.id)/10000 . This way you will avoid a full table scan (which certainly is painful for a 100M table)
You could also try to estimate the DELETE complexity with an EXPLAIN SELECT instead of DELETE:
EXPLAIN SELECT *
FROM "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
AND "players".id BETWEEN 1 AND 10000
Best Answer
Quick Solution is to populate the ids to temp table and use Delete records from the table with Join to temp table.