Mysql – how to remove 6 months old data automatically from MYSQL innodb table

innodbMySQLpartitioning

I have a very big table in which I am fetching data everyday and it's size is getting bigger everyday but for my software, only 6 months data is useful. So, I am planning to remove data which is older than 6 months. I know I can do it using CRON jobs but I want to use MYSQL way to delete older data automatically. I read about PARTITION also and My question is Can I do it using PARTITION or I will have to find some alternative way ?

Best Answer

The Event Scheduler allows to execute regular events according to a schedule. There is detailed example in my post on the Stack Overflow, you just need to change time interval value from 24 hours to 6 months.

Firstly, make sure the Event Scheduler is enabled. To enable it use

SET GLOBAL event_scheduler = ON;

After that you could crate event that will check rows creation time and delete old records. For example

CREATE EVENT cleaning ON SCHEDULE EVERY 1 MONTH ENABLE
  DO 
  DELETE FROM MyTable
  WHERE `timestamp_column` < CURRENT_TIMESTAMP - INTERVAL 6 MONTH;

If there is no column with timestamp of a row creation in your table, then you can create trigger that will insert current timestamp and inserted row identificator to auxiliary table.

CREATE TRIGGER logCreator AFTER INSERT ON MainTable
  FOR EACH ROW 
  INSERT INTO LogTable (MainID, Created) 
  VALUES(NEW.id, CURRENT_TIMESTAMP);

Then you can use this log to get keys of main table rows that was created before specific time and delete corresponding records.

delimiter |
CREATE EVENT cleaning ON SCHEDULE EVERY 1 MONTH ENABLE
  DO
  BEGIN
    DECLARE MaxTime TIMESTAMP;
    SET MaxTime = CURRENT_TIMESTAMP - INTERVAL 6 MONTH;
    DELETE FROM MainTable
    WHERE id IN (SELECT MainID FROM LogTable WHERE Created < MaxTime);
    DELETE FROM LogTable
    WHERE LogTable.Created < MaxTime;
  END |
delimiter ;

Related Solutions

Rename interval partitioning system generated partition names

There's a trick you can use, at least for date-range partitions: the HIGH_VALUE field in user_tab_partitions is a valid date expression. If you "eval" that, you'll get a date object for the cutoff point for that partition. You can then use that to build your partition name from.

Sample PL/SQL to try out (generates a string from the high_value column):

declare
  hv varchar2(9);
begin
  for x in (select partition_name, high_value
              from user_tab_partitions
              where table_name = 'FOO' and partition_name not like 'PART_%')
  loop
    execute immediate 'select to_char('||x.high_value||'-1,''YYYYMMDD'') from dual' into hv;
    dbms_output.put_line('alter table foo rename partition '||x.partition_name
                       ||' to PART_'||hv);
  end loop;
end;
/

(Replace the put_line with execute immediate once you've verified that it actually does what you want. The -1 in the date select is to make it more "human friendly", i.e. partition name to match the date it contains.)

As for dropping old partitions, you can use the same trick. Same type of loop to get a date object for each partition, and generate an alter table ... drop partition ... statement for the partitions you don't want to keep.

Simplistic example where you would drop partitions older than 10 days (be vary wary of off-by-one errors, there probably is one below):

declare
  dt date;
begin
  for x in (select partition_name, high_value
              from user_tab_partitions
              where table_name = 'FOO')
  loop
    execute immediate 'select '||x.high_value||' from dual' into dt;
    if dt < sysdate - 10 then
      dbms_output.put_line('to drop: '||x.partition_name);
    end if;
  end loop;
end;
/

You're then left with taking care of index maintenance if necessary.

Mysql – Deleting & Reclaiming space from InnoDB table

This is a good question. You have several solutions but your table is quite big so none will be without pain :)

You have three solutions to "shrink" InnoDB tables:

1. OPTIMIZE TABLE

You can use OPTIMIZE TABLE as you mentionned it but you should care about the innodb_file_per_table variable :

mysql> show variables like "innodb_file_per_table";
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| innodb_file_per_table | ON    |
+-----------------------+-------+
1 row in set (0.00 sec)

Let me explain:

The OPTIMIZE TABLE with InnoDB tables, locks the table, copies the data in a new clean table (that's why the result is smaller), drops the original table and renames the new table with the original name. That why you should make sure to have twice the space of the original table available on your disk (You'll probably need less, since the optimized table will be smaller, but it's better to be safe than sorry).

innodb_file_per_table = ON : In this mode, all tables have their own data file. The OPTIMIZE statement will then create a new data file with optimized space usage. When the operation is finished, MySQL will drop the original one and replace it with the optimized version (so at the end the 700GB -- probably less because it will be optimized -- of data generated during the operation will be released)

innodb_file_per_table = OFF: In this mode, all data is contained in one data file: ibdata. This mode has a big drawback since it cannot be optimized. So during the OPTIMIZE process, your new table will be created (near 700GB), but even after the drop and renaming operation (and the end of OPTIMIZE phase) your ibdata will not released the ~700GB, so you wanted to free some data, instead you have 700GB more, cool isn't it?

2. ALTER TABLE

You can also use an ALTER TABLE statement, the ALTER TABLE will work in the same way as OPTIMIZE TABLE. You can just use:

ALTER TABLE myTable ENGINE=InnoDB;

3. ALTER TABLE (ONLINE)

The problem of OPTIMIZE and ALTER TABLE is, that it locks the table during operation. You can use the Percona tool : pt-online-schema-change (from Percona Toolkit : link ). pt-online-schema... provide mechanisms to optimize the table, while keeping the original table available for read and writes. I use this tool in production for ALTER statements on big tables and it's pretty cool.

Note that any FOREIGN KEYs referencing your table might complicate things, since locks might lead to locks on other tables and so on. To check this, simply query:

mysql> SELECT COUNT(*) FROM information_schema.REFERENTIAL_CONSTRAINTS WHERE REFERENCED_TABLE_NAME = "myTable";
+----------+
| COUNT(*) |
+----------+
|        0 |
+----------+
1 row in set (0.04 sec)

Here is how I use pt-online-schema-change:

pt-online-schema-change --alter "ENGINE=InnoDB" D=myBase,t=myTable --user --ask-pass

Note that my note on innodb_file_per_table is true also for this solution.

4. mysqldump

The last solution is to recreate all databases from a dump. It takes forever, but it's extremely efficient. Note that this is the only solution to optimize your ibdata file, if innodb_file_per_table is OFF

Max.

Best Answer

Related Solutions

Rename interval partitioning system generated partition names

Mysql – Deleting & Reclaiming space from InnoDB table

Related Question