MySQL Backup – Resolving Increasing tar.gz Size Despite Constant Folder Size

MySQL

Our production site runs on a LAMP stack and we have a daily mysql backup that produces a tar.gz file. I recently reduced the size of some tables by deleting old rows after 3 months rather than 6, and the daily tar.gz file got bigger. The next day I removed an inefficient TEXT field and replaced it with a VARCHAR(3) field…and the daily tar.gz file got bigger. After unpacking into folders, all the individual files and folders (including the ibdata file) in the daily backups appear to be the same size – but on a day by day basis the tar.gz files get ~14mb bigger each day. According to the table size info I get from MySQL, I have reduced the size of my db.

Nothing is actually broken here – site is running fine and the db is responsive – but I cannot find any explanation as to why my backups keep getting bigger even though the db is getting smaller and it's frankly doing my head in. If anyone can offer an explanation that would be great.

Best Answer

You are backing up the raw files, not a "dump" of the rows (via mysqldump or Xtrabackup), correct?

When you DELETE, the rows are held onto in case of a ROLLBACK (explicit or due to crash). This may increase the file size.

Data (and index) files never shrink in MySQL without explicit action. In particular, there is no way to shrink ibdata1 without dumping and reloading.

So...

Plan A: Use mysqldump or xtrabackup to extract the rows. Then, if you like, send that through tar and gzip (or whatever).

Plan B: If the tables are "file_per_table", OPTIMIZE TABLE to shrink the .ibd files before the next dump. (You are stuck with the bloated ibdata1.)

Plan C: If not file_per_table, that is, if all the data is in ibdata1, do a one-time dump, rm ibdata1, reload. This is somewhat tricky and somewhat risky. Test before risking the production data.

If you will regularly be purging old data, consider using Partitioning as an efficient way to purge old data. If you intend to keep only 3 months' worth, then partition by week (14 partitions). Details here. It involves DROP PARTITION which will not leave behind stuff like DELETE does.

As for why the "steep increase" did not continue to happen... The deleted rows left "free" space in the tablespace; that was being reused. For now, your dumps will continue to be about the same size. (Note: gzip, alone, will fluctuate in size as the data changes, even if the the input size seems to stay the same. This is a characteristic of compression algorithms.)