MySQL – Delete from Table Using FK Columns, JOIN, and LIMIT

deleteforeign keyjoin;limitsmysql-5.6

This query below will never complete, since it always looks at the first 100,000 (user_type, user_ids). How can I write this to delete 100,000 records from the table with the FK reference (Product) to the UserTable table? In other words, the limit should happen after the join, but preferably in an efficient way? UserTable has an index on (user_type, user_id) and (shard_key) and Product has an index on (user_type, user_id)

It doesn't necessarily have to be exactly 100,000 – for example, each user could own up to 100 products and that would be fine if the records deleted were (avg. # of products * 100,000)

DELETE p.*
FROM (
    SELECT user_type, user_id
    FROM UserTable
    WHERE MOD(shard_key,20) != 7
    LIMIT 100000
) t
JOIN Product p ON (
    t.user_type = p.user_type AND t.user_id = p.user_id
)

EDIT: This is MySQL 5.6

Best Answer

This workaround is OK for my case but isn't great, since it needs to scan the entire index of the User table, so wouldn't be very portable. The user:product ratio can be up to 1:1000, and it's feasible to scan the User table first using a covering index before doing the JOIN.

DELETE 
FROM Product f 
JOIN ( 
    SELECT t.* FROM (
        SELECT DISTINCT user_type, user_id
        FROM UserTable
        WHERE MOD(shard_key,20) != 7
        ) p
        JOIN Product t ON (
            t.user_type = p.user_type 
            AND 
            t.user_id = p.user_id
        ) LIMIT 10 
    ) x 
ON 
x.user_type = f.user_type 
AND 
x.user = f.user_id;

My takeaway is: If you can reasonably scan the entire primary key table quickly using indexes, and join with the foreign key table afterwards (ideally also using indexes), that is a solution that ensures you will be able to find ALL foreign key references if you want to batch DELETE in situations where many foreign keys reference each primary key.In this case, the User table was ~1GB and the Product table ~350GB

CONSIDERATION #1

Using DELETE FROM tbl ... LIMIT 5000 is not replication-safe because the 5000 rows being deleted may not be the same order or the same set of 5000 rows on a slave. See MySQL Documentation (Replication and LIMIT). Should a DELETE on a Master get interrupted, it has to rollback. In that event, it could be remotely possible that the number of times the DELETE FROM tbl ... LIMIT 5000 is called may not be the same. Extra calls of the DELETE on a Slave is not critical. Too few calls could be bad. This should not happen.

SUGGESTION : Make sure the row count for RTesAll is identical on the Master and all Slaves.

CONSIDERATION #2

Since you are running a Stored Procedure on the Master, the DELETE command will not begin its execution on a Slave until the completion of the DELETE on the Master and the DELETE command is posted in the Master's Binary Logs. With one Master and 4 Slaves (5 DB Servers), there are a certain of calls to DELETE multiplied by 5.

SUGGESTION #1 : You should add set sql_log_bin = 0; to the Stored Procedure

DROP PROCEDURE IF EXISTS PurgeOlderData;
DELIMITER $$
CREATE PROCEDURE `PurgeOlderData`(In StartDate DATETIME ,In EndDate DATETIME,In NoOfRecordsToDelete BIGINT,In TableName CHAR(50))
BEGIN
    SET sql_log_bin             = 0;
    SET @delete_counter             = 0;
    SET @table_name             = TableName;
    SET @number_of_records_to_delete    = NoOfRecordsToDelete;
    SET @start_date             = StartDate;
    SET @end_date               = EndDate;
    
    WHILE @delete_counter < @number_of_records_to_delete DO
        SET @varSQL = CONCAT('DELETE FROM ', @table_name,' WHERE recordDate BETWEEN \'',@start_date ,'\' AND \'', @end_date ,'\' LIMIT 5000;');
        PREPARE stmt FROM @varSQL;
        EXECUTE stmt;
        DEALLOCATE PREPARE stmt;
        /*SELECT @varSQL;*/
        SET @delete_counter = @delete_counter + 5000;
    END WHILE;          
    SET sql_log_bin             = 1;
END $$
DELIMITER ;

to prevent all DELETEs from replicating. If you are deleting 611992998 rows, deleting every 5000 rows requires executing DELETE ... LIMIT 5000 122399 times (That's 611992998 / 5000 rounded up to next integer).

This will prevent having to write to the binlogs 122399 times.

SUGGESTION #2 : Run the Stored Procedure on the Master and Slaves at the same time.

This prevents replication lag
This gets rid of all the rows at the same time
This is best done during maintenance cycles or slow read times

CONSIDERATION #3

Make sure autocommit is enabled on Master and Slaves. Otherwise, all the deletes will pile up in the undo logs inside ibdata1 before either doing all the deletes at once as a single transaction or a huge rollback.

Give it a Try !!!

PostgreSQL Joins – How to Inner Join Some Tables and Outer Join Others

It sounds like you are looking for a FULL [OUTER] JOIN. Per documentation:

FULL OUTER JOIN

First, an inner join is performed. Then, for each row in T1 that does not satisfy the join condition with any row in T2, a joined row is added with null values in columns of T2. Also, for each row of T2 that does not satisfy the join condition with any row in T1, a joined row with null values in the columns of T1 is added.

SELECT *
FROM   users           u  
JOIN   user_products   up ON u.id = up.user_id
                         AND u.id = 1
JOIN   products        p ON p.id = up.product_id
FULL   JOIN categories c ON c.id = p.category_id;

This returns all categories and also returns all user-product combinations for the given user_id.

If you are enforcing referential integrity with foreign keys and products.category_id is defined NOT NULL, so that every product is assigned to an existing category (not in your question), you can replace the FULL JOIN with a RIGHT JOIN.

Plus, either way, the condition to select a specific user has to move to a subquery or (simpler) to a JOIN condition between the first three tables. It has to be applied before the last table categories is joined.

SQL Fiddle.

Best Answer

Related Solutions

Mysql – Delete data from huge InnoDB table from MySQL Master and Slaves

CONSIDERATION #1

CONSIDERATION #2

CONSIDERATION #3

Give it a Try !!!

PostgreSQL Joins – How to Inner Join Some Tables and Outer Join Others

Related Question