MySQL InnoDB Optimization – Dynamic Table Optimization Without ‘File Per Table’

innodbMySQLoptimization

We are getting a “too many connections” error once a week the same time a mysql procedure runs. The procedure runs “optimize table” on hundreds of tables and takes nearly ten hours to finish, taking websites offline. I am thinking because tables are locked connections get backed up, or something like that. So I am thinking it’s the procedure that is the cause of this error and am looking at ways to refactor it to be smarter as to which tables it optimizes and/or chop up task to be run over many days.

This procedure only filters which tables and data bases to run the optimize command on by using LIKE against the table or database name. So, for every table match, “optimize table {tablename}” is run. These tables are both of MyISAM and InnoDB engine types.

One refactor approach would be to only optimize if the table really needs it. If a fragmented table signifies it needs optimizing then finding out if is fragmented is not too difficult if, if the table is a MyISAM, or InnoDB using innodb_file_per_table, where you can do the math on the information_schema fields “Data_free”, data_length and index_length.

Unfortunately the MySQL server is not using the “file per table” setting. This makes the “data_free” field seemly useless for this task because every InnoDB table with have the same value i.e. the free space in the ibdata file.

Maybe if I can understand answers to these questions I will better understand the solution i need.

  • Q 1: How much fragmentation is allowed before it affects performance?
  • Q 2: Should InnoDB tables even be optimized (some say yes others say no)?
  • Q 3: How do you test for InnoDB fragmentation if the server does not use the “file per table” option?
  • Q 4: Is fragmentation the only reason to run “optimize table”?
  • Q 5: If I do need to run “optimize table” on an InnoDB table should I run ALTER TABLE mydb.mytable ENGINE=InnoDB; and not ANALYZE TABLE
  • Q 6: Can you selectively tell which innodb tables needs optimizing if the server does not use the "file per table" option?

Best Answer

Before answering the question, please click here to see the InnoDB Infrastructure Map

Based on innodb_file_per_table being disabled, let's go through your questions:

Q 1: How much fragmentation is allowed before it affects performance?

The system tablespace can grow to the limit of the disk volume.

EXAMPLE: I just answered a question about what do to when the system tablespace reaches the limit of an ext3 disk : How to solve "The table ... is full" with "innodb_file_per_table"?

There may still be some wiggle room inside the system tablespace. However, when the wiggle room dwindles to the point that all 1023 undo logs inside the system tablespace are completely filled and can no longer extend, then you must add a new system tablespace file.

Please note that when I say wiggle room, I am referring to the free space within the system tablespace that must accommodate the following:

  • Data Dictionary
  • Double Write Buffer (can be disabled but not recommended)
  • Insert Buffer (Cached Index Changes in System Tablespace instead of the OS)
  • Rollback Segments (1023 slots)
  • Undo Logs (referenced from the Rollback Segments)
  • Please refer Back to the InnoDB Infrastructure Map

Q 2: Should InnoDB tables even be optimized (some say yes others say no)?

If you run OPTIMIZE TABLE, you basically make the data and index pages contiguous in the system tablespace. This defragments the table and accesses all data and indexes quicker until the fragmentation reappears over time in production use. This can introduce new areas of fragmentation. Again, all that fragmentation can fill up with data and indexes. This will endanger the wiggle room I mentioned before.

Q 3: How do you test for InnoDB fragmentation if the server does not use the “file per table” option?

Back on Aug 27, 2012, I answered this post : How To Optimize and Repair InnoDB tables? ALTER and OPTIMIZE table failed

I explained there how to get the fragmentation. In essence, you do this:

Goto the OS and run

cd /var/lib/mysql
ls -l ibdata1 | awk '{print $5}'

This gets you the size of ibdata1 in bytes

SELECT (data_length+index_length) InnoDBDataIndexBytes
FROM information_schema.tables WHERE engine='InnoDB';

This gets you the sum total of data and index pages in bytes

Subtract the sum total from ibdata1's total bytes. The difference represents the wiggle room. This space causes fragmentation, but is constantly in use until ibdata1 gets filled.

CAVEAT : When innodb_file_per_table is enabled, I explain how to get the fragmentation of an individual table: Innodb table with many deletes and inserts - is there any disk space wasted?

Q 4: Is fragmentation the only reason to run “optimize table”?

Yes. It is far more beneficial for MyISAM and for InnoDB tables (innodb_file_per_table being enabled). Do this with innodb_file_per_table off and you will just make the system tablespace grow faster. See my post How can Innodb ibdata1 file grows by 5X even with innodb_file_per_table set?

Q 5: If I do need to run “optimize table” on an InnoDB table should I run ALTER TABLE mydb.mytable ENGINE=InnoDB; and not ANALYZE TABLE

Running ALTER TABLE mydb.mytable ENGINE=InnoDB; would indeed shrink when innodb_file_per_table is enabled. Again, it is not worth when innodb_file_per_table is disabled.

Q 6: Can you selectively tell which innodb tables needs optimizing if the server does not use the "file per table" option?

No you cannot. Why? The INFORMATION_SCHEMA becomes totally useless because all the tables are inside one file. I wrote a script to find to uptime_time (last time an InnoDB table was written) of all InnoDB tables. Is there a way to find the least recently used tables in a schema? script only works for innodb_file_per_table. This shows that you cannot ascentain that fragmentation with ease. You could resort to more aggressive techniques like dumping the tablespace map and located segments with unused space : See this blog post : http://www.markleith.co.uk/2009/01/19/innodb-table-and-tablespace-monitors/. This is way too much firepower to deal with. You could just run the OPTIMIZE TABLE to eliminate segment fragmentation, but this brings us back full circle to getting everything out of ibdata1.

SUGGESTION

If you want to remove all data and index pages from ibdata1 and shrink ibdata1 permanently, please read my Oct 29, 2010 StackOverflow post Howto: Clean a mysql InnoDB storage engine?

As you can see, this subject is not new to me

EPILOGUE

Running OPTIMIZE TABLE is not the biggest reason that ibdata1 grows quickly. Please see this post from mysqlperformanceblog and learn about the other contributing factors.

Please remember that most who run OPTIMIZE TABLE do so sequentially. You could probably script many of them in parallel. Of course, you need to convert to innodb_file_per_table like I mentioned before.