Mysql – Appropriate use of thesql Optimize Table

innodbmaintenanceMySQLoptimization

I want to come up with some best practice for maintaining our MySQL database, versions 5.5/6 and using InnoDB.

I came across this article which is basically saying that Optimize table:

  1. not going to show much improvement in case your queries don't use the PK.
  2. other indexes on the table are built in a pseudo-random order and most likely would not benefit from the Optimize table.
  3. could actually make Updates slower due to the fact that now each change have higher probability of resulting in a page split.

My questions are:

  1. Are the 3 points above always true? Partially? Not at all?
  2. When is it a good shot to try and Optimize a table?
  3. In which cases Optimizing a table would not benefit or even make some uses of the table worse?
  4. Is there a way to Optimize the indexes of a table other than its PK?

Best Answer

Baron Schwartz, the author of that post, is one of the coauthors of High Performance MySQL, 3rd Edition, one of the best books out there regarding MySQL performance. While the argument of authority is not always a good one, I would like to remark that probably he knows what he is saying.

While everything he says is correct -in my humble opinion-, you must understand the actual underlying argument: defragmenting an InnoDB table is in many cases useless (for performance), and many people recommending to do it frequently are wrong.

Fragmentation and page-splitting is a delicate topic, which people like Jeremy Cole http://blog.jcole.us/2013/04/09/innodb-bugs-found-during-research-on-innodb-data-storage/ and Facebook engineers have mentioned a lot (specially, regarding to compression): https://www.facebook.com/note.php?note_id=10150348315455933 and its implications on performance.

Many times, your performance is load-dependent- Do you insert using an auto-increment? Do you insert and delete many times in the middle of your table? Can you afford the extra disk space if your table is very dynamic?

There are some good practices that I can recommend you (which is probably what you want):

  • Defragment only if you have done a batch DELETE of lots of records (and you do not intend to insert them back). In other cases, it may not be necessary. If you want to know if there is a huge difference between logical data and file size, compare the .idb file with the data + index size from show table status.
  • Speedup insertion by inserting always in PRIMARY KEY order, so you do not force unnecessary page splits.
  • Use partitioning to isolate changes that may alter the internal structure of the table.
  • There is no way to "optimize the secondary indexes", but I would never find such a thing necessary. The change buffer makes sure that changes/rebalances to indexes are done asyncronously without huge performance problems. A BTREE should always be balanced, so assuming your change buffer is not full and the purge thread deletes old row records promptly, your performance should be ok. One way I can think of "optimizing secondary indexes", as you called it, is DROPing the index and recreating it (asuming you are using the InnoDB Plugin or MySQL 5.5+), but I see absolutely no reason to do that.

Of course, if you really want to dig into this topic, create some tables, defragment them and check if you actually have some gain afterwards. In general, tablespace handling and statistics gathering is something relatively automatic on InnoDB.