Mysql – I’m building up a large MyISAM table from a few dumps. The first 21GB took less than 10 hrs total. The next 5GB is forecasted to take two months

myisammysqldumprestore

I had a couple machines building up identical MyISAM tables. I've dumped all of those tables to a few files, and am now trying to restore them all into a single table on my own machine.

The first four restores, done by pv'ing the dumpfile into mysql, took no more than 10 hrs in total (may have been as little as 5). Those files were around 5-13 GB each.

The table is now about 60 million rows and 21 GB. It is also hash-partitioned into 100 partitions. However, I partitioned it when it was 20 million rows and got it up to 60 million without much problem.

Now I'm trying to dump another 5GB (maybe about 15 million rows) into the table, and pv is predicting it will take 50+ days. This doesn't seem to be a problem with pv, however, because the db's data directory is growing very slowly when I do the restore.

Additionally, I've looked at gnome-system-monitor, and the CPU and memory usage seems to be quite low. Tried rebooting and attempting to restore again, no change.

This is actually my second attempt at building up the big table. I tried once before over a longer period (waiting for the data to come in on the other machines), and suddenly I was getting this same problem.

This is a table with a single (natural) BIGINT primary key. I don't know a whole lot about database internals, but I tried a query suggested in this other post: slow load speed of data from mysqldump

to get the size of the index. Here are the results (reformatted for readability)

Storage Engine: MyISAM
Data Size: 19.706 GB
Index Size: 0.954 GB
Table Size: 20.660 GB

I would make sense that inserts would get slower as the table gets larger. However… it doesn't make sense that going from 60GB to 80 or less should take 100 times as long as going from 0 to 60.

So, does anyone have any guess what might be causing this?

Thanks

Best Answer

After a bit more research, I ran "OPTIMIZE table_name", and now the restore that would have taken two months is taking half an hour. According to the docs, the following is a good time to do run OPTIMIZE:

After deleting a large part of a MyISAM or ARCHIVE table, or making many changes to a MyISAM or ARCHIVE table with variable-length rows (tables that have VARCHAR, VARBINARY, BLOB, or TEXT columns). Deleted rows are maintained in a linked list and subsequent INSERT operations reuse old row positions. You can use OPTIMIZE TABLE to reclaim the unused space and to defragment the data file. After extensive changes to a table, this statement may also improve performance of statements that use the table, sometimes significantly.

I didn't do any deletions (does "changes" include insertions?), but I do have big VARCHAR fields.

from here: http://dev.mysql.com/doc/refman/5.1/en/optimize-table.html