MySQL – Delete older Records from a huge table

archivedeleteinnodbMySQL

I have some huge tables which have around 10years data, and each table has approx 10000000+ rows.

Basically, I want to remove older than 2 years data and make a copy of deleted data in a separate table(I'll export the separate table to CSV and drop it later)

I found this great article: Archive huge record

I want the same thing, but I don't have an ID column in my tables.

So the process should like this,

Insert a single row in my separate table, then delete the row from the main table; commit.

Then, repeat the same step.

Its my main production table and its always busy, so no locks, blocks, and deadlock.

Can anyone suggest a better approach for this?

I have already asked a question about archival, but here I don't want to partition the table.

Best Answer

Since you want to archive most of the table, here's what I recommend. And it will facilitate subsequent archivings.

This assumes

  • The table has a DATETIME or TIMESTAMP; I'll call it dt in the code below.
  • New rows will be INSERTed in chronological order or nearly so. (This assumption may not matter.)
  • You can take some downtime, roughly equivalent to twice the time it takes to scan the entire table. If you cannot handle this much downtime, then other (messier) techniques may be possible.

Setup:

SELECT @cutoff = CURDATE() - INTERVAL 2 YEAR;  -- need same cutoff twice
CREATE TABLE new LIKE real;   -- this will become `real` later.
ALTER TABLE new ADD some partitioning ENGINE=InnoDB

The partitioning will be BY RANGE(TODAYS(dt)) with 26 monthly partitions. See http://mysql.rjweb.org/doc.php/partitionmaint for discussion of what they will look like. See below for future archivings.

Shrink real to the last 2 years:

-- Stop writes to the table `real`
-- Note:  This will take a long time
INSERT INTO new SELECT * FROM real WHERE dt >= @cutoff;  -- 2 of 10 years = 20% of table?
RENAME TABLE real TO old, new TO real;
-- Allow writes to the table `real`

Archive older stuff:

SELECT ... INTO OUTFILE '...csv' FROM old WHERE dt < @cutoff;

Future archivings:

Now that you have the table partitioned, you need to do some monthly maintenance:

  • Use "transportable tablespaces" to remove the oldest partition from real and turn it into a csv. (Perhaps ALTER TABLE foo ENGINE=CSV;?)
  • REORGANIZE PARTITION future INTO next-month, future to slide the timescale forward.

These steps will be very fast, and have no impact on inserts into real (unlike the original split).