Mysql – Question About Large InnoDB Table with Deleted Rows

innodbMySQL

Our server has a DB that's over 600 GB – we are deleting older rows that are non-essential to contain the growth. We can't optimize the table due to possible downtown on the table but we are interested to know if queries will be faster with smaller data in the table despite the fact that the table size will not shrink at all.

So basically if the size remains the same, but we are able to delete a significant amount of rows, will our more intensive queries be any faster?

thanks!

Best Answer

If you are worried about table locks caused by DDL operations, like ALTER TABLE... ENGINE=InnoDB (what optimize does for InnodB) you must know that in 5.6, that process can be done fully online, and alternatively, and for lower versions, you can use an online alter table tool like pt-online-schema-change. With proper care, the process can be done fully online and with minimal load increase (although you will need double the table size). Make sure also you are using innodb_file_per_table.

Assuming the table has been optimized and all rows purged, yes, you can obtain an improvement due to the several reasons:

  • Assuming you do not have 600GB of RAM, a larger part of your InnoDB pages will fit into the buffer pool, reducing IO requests. You can check that before and after by monitoring your physical IO or checking InnoDB buffer pool hit statistics, such as innodb_buffer_pool_read_requests and innodb_buffer_pool_reads
  • Having more free space on the buffer pool can also smooth the commit writing process by allowing more dirty pages on memory (this depends highly on the load)
  • If your queries require full table scans, reducing the total number of rows will make those operations faster
  • Index scans should also be faster, specially Index accesses and large ranges. Also InnoDB takes a huge impact when the hot parts of the indexes do not fit in memory (its trees are designed to be cached)
  • Temporary tables, sorts, etc. created could potentially be smaller, reducing the latency of its execution

So the actual improvement may depend a lot on the actual queries executed, usage or not of hashed searches, memory state, penalty of random IO reads, etc. But in general, working with smaller data sets is easier.

Beware also of the purge process, which can bite your performance if not done well.

Some of this advantages could be lower if there is a lot of fragmentation, but that depends a lot on the actual state of the table. You can see some thoughts about optimize on InnoDB here (more on comments):