MySQL Delete Rows from Slave

innodbMySQLreplication

I want to truncate the majority of data on a slave such that it only has the most recent records. I'll eventually demote the current master, making it a read-only archive, and promote the truncated slave to master. The primary motivation for truncation is reclamation of disk space; I want to dump and restore the slave.

Questions:

  1. Regarding the dump/restore operation on the slave, I'm planning on stopping the IO and SQL threads, then dumping the table, then dropping the table, the restoring from the dump, then starting the slave threads. Does this sound like the right process?

  2. What happens, after deleting rows from the slave, if the master sends an UPDATE or DELETE referencing a deleted row?

Best Answer

    • If you are using InnoDB with the option innodb_file_per_table option off (by default on MySQL 5.1 and before), you need to stop the server after the dump (make sure you are not using innodb_fast_shutdown = 2), then delete at least the ibdata1, ib_logfile0, ib_logfile1 and your database directory and finally restart MySQL and import only the final truncated data. Otherwise, your ibdata1 file won't shrink in size.
    • If you have active the option innodb_file_per_table (by default on MySQL 5.5 and later), you have separate .ibd files per table, and your ibdata1 file is not too big, you do not need to import/export the data and delete the InnoDB files. Just delete the appropriate rows and defragment the .ibd files by doing:

      ALTER TABLE <your table> ENGINE=InnoDB; -- for 5.5 and before
      
      ALTER TABLE <your table> ENGINE=InnoDB, ALGORITHM=COPY; -- for 5.6 and later
      
  1. Replication will probably break. On ROW based replication, it will certainly break as it won't find the right rows to affect. On STATEMENT mode some queries may continue working, affecting 0 rows, but as time passes, it is highly probable to find some incompatible query, like an UPDATE that can be done on the master but not on the slave due to a unique key constraint. I do not recommend running a replication with different data on master and slave unless you know for sure that those parts are not modified or they are done in a very restricted way. I can tell you that most of the replication problems I fix are due to replication filters.