Here are some things you should consider:
CONSIDERATION #1
Using DELETE FROM tbl ... LIMIT 5000
is not replication-safe because the 5000 rows being deleted may not be the same order or the same set of 5000 rows on a slave. See MySQL Documentation (Replication and LIMIT). Should a DELETE on a Master get interrupted, it has to rollback. In that event, it could be remotely possible that the number of times the DELETE FROM tbl ... LIMIT 5000
is called may not be the same. Extra calls of the DELETE
on a Slave is not critical. Too few calls could be bad. This should not happen.
SUGGESTION : Make sure the row count for RTesAll
is identical on the Master and all Slaves.
CONSIDERATION #2
Since you are running a Stored Procedure on the Master, the DELETE
command will not begin its execution on a Slave until the completion of the DELETE
on the Master and the DELETE
command is posted in the Master's Binary Logs. With one Master and 4 Slaves (5 DB Servers), there are a certain of calls to DELETE multiplied by 5.
SUGGESTION #1 : You should add set sql_log_bin = 0;
to the Stored Procedure
DROP PROCEDURE IF EXISTS PurgeOlderData;
DELIMITER $$
CREATE PROCEDURE `PurgeOlderData`(In StartDate DATETIME ,In EndDate DATETIME,In NoOfRecordsToDelete BIGINT,In TableName CHAR(50))
BEGIN
SET sql_log_bin = 0;
SET @delete_counter = 0;
SET @table_name = TableName;
SET @number_of_records_to_delete = NoOfRecordsToDelete;
SET @start_date = StartDate;
SET @end_date = EndDate;
WHILE @delete_counter < @number_of_records_to_delete DO
SET @varSQL = CONCAT('DELETE FROM ', @table_name,' WHERE recordDate BETWEEN \'',@start_date ,'\' AND \'', @end_date ,'\' LIMIT 5000;');
PREPARE stmt FROM @varSQL;
EXECUTE stmt;
DEALLOCATE PREPARE stmt;
/*SELECT @varSQL;*/
SET @delete_counter = @delete_counter + 5000;
END WHILE;
SET sql_log_bin = 1;
END $$
DELIMITER ;
to prevent all DELETEs from replicating. If you are deleting 611992998 rows, deleting every 5000 rows requires executing DELETE ... LIMIT 5000
122399 times (That's 611992998 / 5000 rounded up to next integer).
This will prevent having to write to the binlogs 122399 times.
SUGGESTION #2 : Run the Stored Procedure on the Master and Slaves at the same time.
- This prevents replication lag
- This gets rid of all the rows at the same time
- This is best done during maintenance cycles or slow read times
CONSIDERATION #3
Make sure autocommit
is enabled on Master and Slaves. Otherwise, all the deletes will pile up in the undo logs inside ibdata1 before either doing all the deletes at once as a single transaction or a huge rollback.
Give it a Try !!!
It sounds like you are looking for a FULL [OUTER] JOIN
. Per documentation:
FULL OUTER JOIN
First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row
is added with null values in columns of T2. Also, for each row of T2
that does not satisfy the join condition with any row in T1, a joined
row with null values in the columns of T1 is added.
SELECT *
FROM users u
JOIN user_products up ON u.id = up.user_id
AND u.id = 1
JOIN products p ON p.id = up.product_id
FULL JOIN categories c ON c.id = p.category_id;
This returns all categories and also returns all user-product combinations for the given user_id
.
If you are enforcing referential integrity with foreign keys and products.category_id
is defined NOT NULL
, so that every product is assigned to an existing category (not in your question), you can replace the FULL JOIN
with a RIGHT JOIN
.
Plus, either way, the condition to select a specific user has to move to a subquery or (simpler) to a JOIN
condition between the first three tables. It has to be applied before the last table categories
is joined.
SQL Fiddle.
Best Answer
This workaround is OK for my case but isn't great, since it needs to scan the entire index of the User table, so wouldn't be very portable. The user:product ratio can be up to 1:1000, and it's feasible to scan the User table first using a covering index before doing the JOIN.
My takeaway is: If you can reasonably scan the entire primary key table quickly using indexes, and join with the foreign key table afterwards (ideally also using indexes), that is a solution that ensures you will be able to find ALL foreign key references if you want to batch DELETE in situations where many foreign keys reference each primary key.In this case, the User table was ~1GB and the Product table ~350GB