Ok, first off I think you are almost there. From looking at the schema and data one thing I have noticed is that you seem to have overlooked the fact that a friendship is bi-directional. So when you create a friend entry from a request you also need to create one in the other direction as well:
INSERT INTO `default_friend` (`friend_id`, `user_id`, `is_suscriber`, `privacy`, `created_at`, `friend_list_id`, `approved`)
VALUES (1, 2, 1, 0, '2012-08-13 18:16:11', 0, 1);
After you have done that your query should be more like the result you are after. Running this query:
select distinct u.id as `user_id`, u.username, f.id as `friend_id`, f.username as friend, s.*
from default_users as u
left join default_friend as df on df.user_id = u.id
left join default_users as f on f.id = df.friend_id
left join default_status as s on s.user_id = u.id
left join default_comment as c on c.status_id = s.status_id
order by s.status_id;
returns the following result set:
user_id username friend_id friend status_id message created_at privacy user_id is_reply device
1 admin 2 demo 1 dasdasdasdasdasd 2012-08-13 19:45:37 NULL 1 0
2 demo 1 admin 2 dasdasdasdasdasd 2012-08-13 19:46:03 NULL 2 0
1 admin 2 demo 3 dasdsad344hbvnbnhjhgjhjghjhj 2012-08-13 21:54:53 NULL 1 0
Is this anywhere near what you are looking for?
In MySQL there is a multi-table DELETE syntax. Your first DELETE will delete rows only from the players
TABLE. If you want to delete from multiple tables you have to use something like:
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
This doesn't address the problem with the long-running DELETE statement though. In fact above query should take even more time, because now it actually would delete rows from stats
and photos
tables. The workaround you could use is to split the large DELETE into smaller ones. Since you have a nice WHERE condition, you could manually split the deletes on that (for example one DELETE for each ten years of players.born_on
) and run them in ascending order, that is:
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1930-01-01";
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1940-01-01";
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01";
It this is too coarse (i.e. it takes too long do execute each query) you should make the WHERE conditions even more fine-grained (perhaps delete one year each chunk).
Also there is a --purge
option for pt-archiver from Percona Toolkit which would split the data to be deleted in chunks automatically, but it doesn't seem to support the multi table case. See example usage of pt-archiver in this presentation
From your table definitions I see that you don't have an index on players.birth_date
(I suppose that this is the column you relate to as born_at
in your example queries). This makes the decade chunks approach useless, since every query would have to scan all players
table.
If you can't afford having a long table lock for the DELETE to finish, you most likely can't afford to create an index on the birth_date
column as well.
You could split the data on another column, PRIMARY KEY is a good bet. You can write a script which would process all the players in chunks of 10000 (or less or more, depending on the length of a single DELETE statement):
DELETE FROM "players","stats","photos"
USING "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
AND "players".id BETWEEN n*10000+1 AND (n+1)*10000
Where n would be a parameter ranging from 0 to MAX(players.id)/10000 . This way you will avoid a full table scan (which certainly is painful for a 100M table)
You could also try to estimate the DELETE complexity with an EXPLAIN SELECT instead of DELETE:
EXPLAIN SELECT *
FROM "players"
LEFT JOIN "stats"
ON "players".id = "stats".player_id
LEFT JOIN "photos"
ON "players".id = "photos".player_id
WHERE "players".born_on < "1950-01-01"
AND "players".id BETWEEN 1 AND 10000
Best Answer
The following will work in Postgres. Test here: SQL-Fiddle, postgres-test. SQL-Server does not have
NATURAL JOIN
and MySQL hasNATURAL
but doesn't haveFULL
joins: