Mysql – Delete millions of rows based on ids in another table

deleteMySQL

I have two tables on MySQL:

  • HISTORY_TABLE (4 million rows) (column1 indexed)
  • EXTRACT_TABLE (600K rows) (column1 indexed)

For every EXTRACT_TABLE.column1 there are multiple HISTORY_TABLE.column1 rows.

Objective: I want to delete all records that MATCH criteria:

HISTORY_TABLE.column1 = EXTRACT_TABLE.column1

I tried two different possibilities:

  • Simple query that matches criteria (10 hours)
  • Copying the NOT EXISTS rows into a new table (2days+)

I found via searching that deletion can be done in chunks with a stored procedure, but I don't know how.

Is there an example on how to do a cursor based on EXTRACT_TABLE to read in chunks and delete the other table? Or is there other way to accomplish my objective?

Best Answer

Since you are deleting a large percentage of the table, it would be better to build a new table with the rows to keep:

CREATE TABLE new LIKE real;
INSERT INTO new
    SELECT * FROM real
        LEFT JOIN extract  ON ...
        WHERE ... IS NULL;
RENAME TABLE real TO old,
             new TO real;    -- swap tables (sort of)
DROP TABLE old;              -- clean up

Even that may be too invasive. See the following for techniques for deleting "in chunks". They won't finish "fast", but they won't have nearly as much impact on other activity. http://mysql.rjweb.org/doc.php/deletebig

(FOREIGN KEYs and TRIGGERs are likely to cause trouble.)