MySQL – Delete older Records from a huge table

archivedeleteinnodbMySQL

I have some huge tables which have around 10years data, and each table has approx 10000000+ rows.

Basically, I want to remove older than 2 years data and make a copy of deleted data in a separate table(I'll export the separate table to CSV and drop it later)

I found this great article: Archive huge record

I want the same thing, but I don't have an ID column in my tables.

So the process should like this,

Insert a single row in my separate table, then delete the row from the main table; commit.

Then, repeat the same step.

Its my main production table and its always busy, so no locks, blocks, and deadlock.

Can anyone suggest a better approach for this?

I have already asked a question about archival, but here I don't want to partition the table.

Best Answer

Since you want to archive most of the table, here's what I recommend. And it will facilitate subsequent archivings.

This assumes

The table has a DATETIME or TIMESTAMP; I'll call it dt in the code below.
New rows will be INSERTed in chronological order or nearly so. (This assumption may not matter.)
You can take some downtime, roughly equivalent to twice the time it takes to scan the entire table. If you cannot handle this much downtime, then other (messier) techniques may be possible.

Setup:

SELECT @cutoff = CURDATE() - INTERVAL 2 YEAR;  -- need same cutoff twice
CREATE TABLE new LIKE real;   -- this will become `real` later.
ALTER TABLE new ADD some partitioning ENGINE=InnoDB

The partitioning will be BY RANGE(TODAYS(dt)) with 26 monthly partitions. See http://mysql.rjweb.org/doc.php/partitionmaint for discussion of what they will look like. See below for future archivings.

Shrink real to the last 2 years:

-- Stop writes to the table `real`
-- Note:  This will take a long time
INSERT INTO new SELECT * FROM real WHERE dt >= @cutoff;  -- 2 of 10 years = 20% of table?
RENAME TABLE real TO old, new TO real;
-- Allow writes to the table `real`

Archive older stuff:

SELECT ... INTO OUTFILE '...csv' FROM old WHERE dt < @cutoff;

Future archivings:

Now that you have the table partitioned, you need to do some monthly maintenance:

Use "transportable tablespaces" to remove the oldest partition from real and turn it into a csv. (Perhaps ALTER TABLE foo ENGINE=CSV;?)
REORGANIZE PARTITION future INTO next-month, future to slide the timescale forward.

These steps will be very fast, and have no impact on inserts into real (unlike the original split).

Related Solutions

Mysql – Delete and archive old data from several related tables

Try pt-archiver from Percona Toolkit, it permits to transfer data on the fly between two MySQL instances.

You can for instance use the --where option to filter result set by date.

It's particularly well designed for your needs, because you can transfer data and delete it from source in the same command.

Official page is here : pt-archiver

An example :

pt-archiver --source h=<server_source>,D=<database_source>,t=<your_table> --dest h=<server_target> --where "date_field < DATE_SUB(NOW(), INTERVAL 3 MONTH)" --limit 1000 --txn-size 1000 --statistics

Add the option --dry-run if you want to test.

Add the option --no-delete if you want to keep data on source DB.

Mysql – Using MySQL InnoDB as an Archive

I would be very tempted to store in a no-SQL data store, like Mongo or Couch. Writes are incredibly fast, scales well, etc.

You might even archive in a mongo collection, then store "processed" results in an RDBMS, which you can then query very quickly with SQL.

To stay in MySQL, you're looking at some sort of partitioning scheme to get this to scale at all.

Best Answer

Related Solutions

Mysql – Delete and archive old data from several related tables

Mysql – Using MySQL InnoDB as an Archive

Related Question