Mysql – Time to import an sql dump

MySQL

I'm currently uploading one of Wikipedia's dump files. It has 1GB compressed and ~7GB uncompressed. It has been importing it for 6 hours already. The size of the imported table in MySQL is at the moment 1.5GB(data+indexes). Will it continue to import until the size of the table will reach 7GB?

Note: The data directory of the database in on an external HDD. Considering this is only a development, so I'm not concerned by the real performance.

Best Answer

Can't say about the size, but you could try and check for the last entry or the amount of entries in the dump and then check for the current last entry in your database. This might help you determine the time the import will take till it finishes.

The size of the imported data might even get bigger than the uncompressed 7GB, since the indexes are usually not contained in dumps but get built on insert.

As a sidenote: This is also a way to speed up the import itself: drop the index during import and rebuild it later on, this helped me several times to speed things up.

Related Solutions

Mysql – Why would the size of MySQL MyISAM table indexes (aka MYI file) not match after thesqldump import

I think I have a plausible explanation you find very intriguing.

When you loaded the MyISAM table, only one index got loaded into the .MYI file in a lopsided manner.

Which index would that be ? THE PRIMARY KEY. Why ?

Indexes usually use BTREEs. They are designed to collect keys into a single BTREE node and balance the BTREE internally until the BTREE is full. On the insert into the full node, the full tree node is split and the keys inside are divided.

Most often a BTREE node will split is when you load the keys in order. For proof, look at the worst case BTREE --- the balanced binary tree.

I have several posts in the DBA STackExchange where I mention how a balanced binary only has node with one key. I also mention how a leaf pages must rebalance about 45% of the time when inserting a new key. Here are those posts:

Aug 03, 2011 : When should I rebuild indexes?
Jun 28, 2012 : Benefits of BTREE in MySQL
Oct 26, 2012 : How badly does innodb fragment in the face of somewhat out-of-order insertions?
Nov 26, 2012 : Does the Int type primary key insert order affect query performance in MySQL?
Jun 04, 2013 : Why would increase in innodb_buffer_pool_size slow down MySQL?

My working theory is that random inserts into the table may actually make the .MYI smaller by prevent tree node splits as much as possible.

In order to make this happen, you will have to something quite unusual. Ready for this ?

ALTER TABLE generic_dummy_table_name ORDER BY datetime;

ALTER TABLE generic_dummy_table_name ORDER BY user,datetime;

That right, you can reorder the physical rows of the table. This may make the indexes a different size by inducing BTREE splits on another index.

Try reordering the table. Then, mysqldump it and reload it. You may find that the .MYI may get smaller or bigger. You cannot really predict it.

Next time, try reloading the MyISAM table with a very large bulk insert buffer.

Just run

SET GLOBAL bulk_insert_buffer_size = 1024 * 1024 * 1024;

then connect to mysql and reload the mysqldump.

GIVE IT A TRY !!!

MySQL InnoDB Import Performance – Tips and Optimization

First, you need to know what you are doing to InnoDB when you plow millions of rows into an InnoDB table. Let's take a look at the InnoDB Architecture.

InnoDB Architecture

In the upper left corner, there is an illustration of the InnoDB Buffer Pool. Notice there is a section of it dedicated to the insert buffer. What does that do ? It is ised to migrate changes to secondary indexes from the Buffer Pool to the Insert Buffer inside the system tablespace (a.k.a. ibdata1). By default, innodb_change_buffer_max_size is set to 25. This means that up to 25% of the Buffer Pool can be used for processing secondary indexes.

In your case, you have 6.935 GB for the InnoDB Buffer Pool. A maximum of 1.734 GB will be used for processing your secondary indexes.

Now, look at your table. You have 13 secondary indexes. Each row you process must generate a secondary index entry, couple it with the primary key of the row, and send them as a pair from the Insert Buffer in the Buffer Pool into the Insert Buffer in ibdata1. That happens 13 times with each row. Multiply this by 10 million and you can almost feel a bottleneck coming.

Don't forget that importing 10 million rows in a single transaction will pile up everything into one rollback segment and fill up the UNDO space in ibdata1.

SUGGESTIONS

SUGGESTION #1

My first suggestion for importing this rather large table would be

Drop all the non-unique indexes
Import the data
Create all the non-unique indexes

SUGGESTION #2

Get rid of duplicate indexes. In your case, you have

KEY `party_id` (`party_id`),
KEY `party_id_2` (`party_id`,`status`)

Both indexes start with party_id, you can increase secondary index processing by at least 7.6 % getting rid one index out of 13. You need to eventually run

ALTER TABLE monster DROP INDEX party_id;

SUGGESTION #3

Get rid of indexes you do not use. Look over your application code and see if your queries use all the indexes. You may want to look into pt-index-usage to let it suggest what indexes are not being used.

SUGGESTION #4

You should increase the innodb_log_buffer_size to 64M since the default is 8M. A bigger log buffer may increase InnoDB write I/O performance.

EPILOGUE

Putting the first two suggestions in place, do the following:

Drop the 13 non-unique indexes
Import the data
Create all the non-unique indexes except the party_id index

Perhaps the following may help

CREATE TABLE monster_new LIKE monster;
ALTER TABLE monster_new
  DROP INDEX `party_id`,
  DROP INDEX `creation_date`,
  DROP INDEX `email`,
  DROP INDEX `hash`,
  DROP INDEX `address_hash`,
  DROP INDEX `thumbs3`,
  DROP INDEX `ext_monster_id`,
  DROP INDEX `status`,
  DROP INDEX `note`,
  DROP INDEX `postcode`,
  DROP INDEX `some_id`,
  DROP INDEX `cookie`,
  DROP INDEX `party_id_2`;
ALTER TABLE monster RENAME monster_old;
ALTER TABLE monster_new RENAME monster;

Import the data into monster. Then, run this

ALTER TABLE monster
  ADD INDEX `creation_date`,
  ADD INDEX `email` (`email`(4)),
  ADD INDEX `hash` (`hash`(8)),
  ADD INDEX `address_hash` (`address_hash`(8)),
  ADD INDEX `thumbs3` (`thumbs3`),
  ADD INDEX `ext_monster_id` (`ext_monster_id`),
  ADD INDEX `status` (`status`),
  ADD INDEX `note` (`note`(4)),
  ADD INDEX `postcode` (`postcode`),
  ADD INDEX `some_id` (`some_id`),
  ADD INDEX `cookie` (`cookie`),
  ADD INDEX `party_id_2` (`party_id`,`status`);

GIVE IT A TRY !!!

ALTERNATIVE

You could create a table called monster_csv as a MyISAM table with no indexes and do this:

CREATE TABLE monster_csv ENGINE=MyISAM AS SELECT * FROM monster WHERE 1=2;
ALTER TABLE monster RENAME monster_old;
CREATE TABLE monster LIKE monster_old;
ALTER TABLE monster DROP INDEX `party_id`;

Import your data into monster_csv. Then, use mysqldump to create another import

mysqldump -t -uroot -p mydb monster_csv | sed 's/monster_csv/monster/g' > data.sql

The mysqldump file data.sql will extended INSERT commands importing 10,000-20,000 rows at a time.

Now, just load the mysqldump

mysql -uroot -p mydb < data.sql

Finally, get rid of the MyISAM table

DROP TABLE monster_csv;

Best Answer

Related Solutions

Mysql – Why would the size of MySQL MyISAM table indexes (aka MYI file) not match after thesqldump import

GIVE IT A TRY !!!

MySQL InnoDB Import Performance – Tips and Optimization

SUGGESTIONS

SUGGESTION #1

SUGGESTION #2

SUGGESTION #3

SUGGESTION #4

EPILOGUE

GIVE IT A TRY !!!

ALTERNATIVE

Related Question