MySQL – Does InnoDB Table Perform Faster When Data is Removed?

innodbMySQLoptimizationperformance

I have a 5GB InnoDB table in MySQL. I updated several rows where a column had massive amounts of data to be blank, hoping to make the table smaller and perform quicker.

I then discovered that InnoDB doesn't reduce disk space when you remove data from it.

https://stackoverflow.com/questions/1270944/mysql-innodb-not-releasing-disk-space-after-deleting-data-rows-from-table?rq=1

However, there is less data to search through now, so I'd like to assume it will still perform better, especially when performing a LIKE query.

I tried to do some benchmark tests, but very unfortunately did not take accurate before measurements.

Will my InnoDB table perform faster now that I've removed data from the table, or does it still have to search through the same amount of disk and there will be no difference.

Best Answer

InnoDB may not release the diskspace but your query on your table is actually hitting less data. The size of your actual table has been reduced, but InnoDB will keep the storage for later growth on that table (or any other table in that database).

Your InnoDB table may or may not perform better depending on what percentage of total rows you purged from your table. If you removed 5% of total rows, for example, you probably won't see noticable speedup (up to 5%). If you removed 90%, you'll see it speedup quite a bit.

Some other approaches to increasing performance is writing a better query or indexing the table such that lookups are faster.

Related Solutions

MySQL InnoDB – Setting Data File on Fixed-Size Disk Slice

First of all, what are the entry types that reside in ibdata1 ? Four(4) entry types:

Table MetaData
Table Data Pages
Index Data Pages
MVCC Data

Whenever an InnoDB table experiences DDL, DML, or being used in a Transaction, all four of these types of entries are either read or written. Meanwhile, if innodb_file_per_table is disabled, all these entry types live in ibdata1. If it is enabled, only the Table MetaData and the MVCC Data would reside in ibdata1 while the Table Data Pages and Index Data Pages would reside in the database subfolder as a .ibd file.

That being considered, what would happen if ibdata1 were placed in another volume and symlinked ?

For starters, how does MySQL represent a table regardless of the storage engine ? As a .frm file. Where do .frm files live ? In the datadir. What's wrong with that ?

Here is an example:

Using the default datadir of /var/lib/mysql, let's use an InnoDB table called mydb.mytable.

With innodb_file_per_table disabled, everything would sit in ibdata1 (which you are proposing to symlink and send off to another data volume). For the table mydb.mytable, this is what you would have:

/var/lib/mysql/mydb/mytable.frm
Everything else about the table lives in ibdata1

Picture this now: You access the table, MySQL would first hit /var/lib/mysql/mydb/mytable.frm and then hit the data and index pages in ibdata1 for mydb.mytable. This would constantly be happening with every access of mydb.mytable. This cascading back-and-forth would somehow make things a little slower and you may not get the performance you were expecting by moving ibdata1 to some other data volume. In fact, the cascading effect would now be a factor of the number of InnoDB tables multiplied by two(2).

Imagine having innodb_file_per_table enabled. Now you would have a slightly different setup:

/var/lib/mysql/mydb/mytable.frm
/var/lib/mysql/mydb/mytable.ibd
MVCC data and Table MetaData would reside in ibdata1

This cascading would be a little worse because the cascading for table access would now occur among three files instead of two.

Here is one more scenario some have thought of: Instead of moving the ibdata1 to a different volume, how about enabling innodb_file_per_table and moving the .ibd files to one or more different data volumes ? The cascading effect would now be a factor of the number of InnoDB tables multiplied by three(3). Percona has expressed very good reasons for not doing this that you will find helpful.

MySQL InnoDB Log – How to Resolve ‘In the Future’ Error

This was a pretty rare situation. I hope to never end up there again, with an InnoDB "log sequence number is in the future!" error. Because of my particular details, rebuilding/restoring my server's data was a last resort. Some cheats in helping that were good ideas, but in the end, I decided to just keep improving my Perl script to play this silly game and churn through as many gigs/hour as I could. What the heck, it's a good system stress test.

Remember: the goal is to increase a single counter ("log sequence number") which is stored somewhere in the headers of ib_logfile0 and ib_logfile1. This is to fake out InnoDB so it will ignore an apparent time warp and get on with life. But no one knows how to edit that number. Or if they do know, no one's talking.

Here's my final product. YMMV, but using mysql's REPEAT function to generate the data internally is highly efficient.

 #!/usr/bin/perl
 use DBI;
 $table = shift || die;
 $dbh = DBI->connect("DBI:mysql:junk:host=localhost", "user", "pass"); #Edit "junk" (DB name), user, and pass to suit.
 $dbh->do("DROP TABLE IF EXISTS $table");
 $dbh->do("CREATE TABLE $table (str TEXT) ENGINE=INNODB");
 $sth = $dbh->prepare("INSERT INTO $table (str) VALUES (REPEAT(?,1000000))");
 foreach (1..50) {
    $sth->execute('0123456789');   # 10 MB
 }
 $dbh->do("DELETE FROM $table");

My suggested recipe:

Create a 'junk' database
Save the above perl script as junk.pl.
Run junk.pl data1, and junk.pl data2, and junk.pl data3, etc. all at once, for as many CPU cores as your database server has, to start. Open multiple shells and wrap each run in a Bash loop: while true; do date; junk.pl dataX; done.

Watch your LSN grow, perhaps in another loop:

 silly# echo "SHOW INNODB STATUS \G" | mysql -p'xxxxxx' | grep '^Log seq'
 Log sequence number 124 3871092821
 silly# echo "SHOW INNODB STATUS \G" | mysql -p'xxxxxx' | grep '^Log seq'
 Log sequence number 124 4209892586
 silly# echo "SHOW INNODB STATUS \G" | mysql -p'xxxxxx' | grep '^Log seq'
 Log sequence number 125 85212387

The big number is an unsigned 32-bit INT which will wrap at 4GB, increasing the smaller number every time. In this case above it just rolled from 124 to 125. Your goal is hidden in the mysqld.log that sent you Googling for this ridiculous solution in the first place. Once you cross that finish line, that's it! Blow the horns! Release the confetti!

Sidebar: This uncovered an interesting bug in mysqld 5.0 w/REPEAT: if you go to 20 MB, it flips some internal counter and rolls over to ~ 96 KB. No warning or error anywhere. I wasn't about to waste time tracking that down. 10 MB works great. If you do hit some other limit, it may complain. I have various innodb buffers increased from default. Season to taste. As always, watch mysqld.log in one window.

Best Answer

Related Solutions

MySQL InnoDB – Setting Data File on Fixed-Size Disk Slice

MySQL InnoDB Log – How to Resolve ‘In the Future’ Error

Related Question