First, you need to know what you are doing to InnoDB when you plow millions of rows into an InnoDB table. Let's take a look at the InnoDB Architecture.
In the upper left corner, there is an illustration of the InnoDB Buffer Pool. Notice there is a section of it dedicated to the insert buffer. What does that do ? It is ised to migrate changes to secondary indexes from the Buffer Pool to the Insert Buffer inside the system tablespace (a.k.a. ibdata1). By default, innodb_change_buffer_max_size is set to 25. This means that up to 25% of the Buffer Pool can be used for processing secondary indexes.
In your case, you have 6.935 GB for the InnoDB Buffer Pool. A maximum of 1.734 GB will be used for processing your secondary indexes.
Now, look at your table. You have 13 secondary indexes. Each row you process must generate a secondary index entry, couple it with the primary key of the row, and send them as a pair from the Insert Buffer in the Buffer Pool into the Insert Buffer in ibdata1. That happens 13 times with each row. Multiply this by 10 million and you can almost feel a bottleneck coming.
Don't forget that importing 10 million rows in a single transaction will pile up everything into one rollback segment and fill up the UNDO space in ibdata1.
SUGGESTIONS
SUGGESTION #1
My first suggestion for importing this rather large table would be
- Drop all the non-unique indexes
- Import the data
- Create all the non-unique indexes
SUGGESTION #2
Get rid of duplicate indexes. In your case, you have
KEY `party_id` (`party_id`),
KEY `party_id_2` (`party_id`,`status`)
Both indexes start with party_id
, you can increase secondary index processing by at least 7.6 % getting rid one index out of 13. You need to eventually run
ALTER TABLE monster DROP INDEX party_id;
SUGGESTION #3
Get rid of indexes you do not use. Look over your application code and see if your queries use all the indexes. You may want to look into pt-index-usage to let it suggest what indexes are not being used.
SUGGESTION #4
You should increase the innodb_log_buffer_size to 64M since the default is 8M. A bigger log buffer may increase InnoDB write I/O performance.
EPILOGUE
Putting the first two suggestions in place, do the following:
- Drop the 13 non-unique indexes
- Import the data
- Create all the non-unique indexes except the
party_id
index
Perhaps the following may help
CREATE TABLE monster_new LIKE monster;
ALTER TABLE monster_new
DROP INDEX `party_id`,
DROP INDEX `creation_date`,
DROP INDEX `email`,
DROP INDEX `hash`,
DROP INDEX `address_hash`,
DROP INDEX `thumbs3`,
DROP INDEX `ext_monster_id`,
DROP INDEX `status`,
DROP INDEX `note`,
DROP INDEX `postcode`,
DROP INDEX `some_id`,
DROP INDEX `cookie`,
DROP INDEX `party_id_2`;
ALTER TABLE monster RENAME monster_old;
ALTER TABLE monster_new RENAME monster;
Import the data into monster
. Then, run this
ALTER TABLE monster
ADD INDEX `creation_date`,
ADD INDEX `email` (`email`(4)),
ADD INDEX `hash` (`hash`(8)),
ADD INDEX `address_hash` (`address_hash`(8)),
ADD INDEX `thumbs3` (`thumbs3`),
ADD INDEX `ext_monster_id` (`ext_monster_id`),
ADD INDEX `status` (`status`),
ADD INDEX `note` (`note`(4)),
ADD INDEX `postcode` (`postcode`),
ADD INDEX `some_id` (`some_id`),
ADD INDEX `cookie` (`cookie`),
ADD INDEX `party_id_2` (`party_id`,`status`);
GIVE IT A TRY !!!
ALTERNATIVE
You could create a table called monster_csv
as a MyISAM table with no indexes and do this:
CREATE TABLE monster_csv ENGINE=MyISAM AS SELECT * FROM monster WHERE 1=2;
ALTER TABLE monster RENAME monster_old;
CREATE TABLE monster LIKE monster_old;
ALTER TABLE monster DROP INDEX `party_id`;
Import your data into monster_csv
. Then, use mysqldump to create another import
mysqldump -t -uroot -p mydb monster_csv | sed 's/monster_csv/monster/g' > data.sql
The mysqldump file data.sql
will extended INSERT commands importing 10,000-20,000 rows at a time.
Now, just load the mysqldump
mysql -uroot -p mydb < data.sql
Finally, get rid of the MyISAM table
DROP TABLE monster_csv;
The COMMIT
is fairly obvious; it's committing the effects of the prior ALTER TABLE DETACH...
statement (and any statements prior to that and the preceding COMMIT/ROLLBACK - transactions are implicitly started in DB2). I assume there's a bit more to that DETACH statement but that it's omitted for brevity.
The SET INTEGRITY
statement (I'd strongly recommend reading that page, especially the Notes section) is more complex. In this case it's bringing the table out of check-pending status by telling the database engine that the table's data still conforms to all the constraints on the table, but that the engine shouldn't bother checking that itself. This is as opposed to SET INTEGRITY ... IMMEDIATE CHECKED
in which the database engine would check the data is correct.
The table has presumably been left in check-pending state by some earlier operation (I don't think detaching partitions causes check-pending, although attaching them certainly does - see ALTER TABLE
for more information).
As the Notes section of the SET INTEGRITY
reference page states:
The fact that the table was taken out of the set integrity pending
state without performing the required integrity processing will be
recorded in the catalog (the respective byte in the CONST_CHECKED
column in the SYSCAT.TABLES view will be set to 'U'). This indicates
that the user has assumed responsibility for data integrity with
respect to the specific constraints...
I'd strongly recommend reading the rest of the Notes section for more info on using IMMEDIATE UNCHECKED
.
Best Answer
Short answer: You are right in that you cannot specify an exception table for the
IMPORT
command. Also, there is no need to run theSET INTEGRITY
statement if you useIMPORT
.Here is why:
The biggest reason for
SET INTEGRITY
is based on howLOAD
andIMPORT
work.IMPORT
uses normalINSERT
statements to get the data into the tables. Doing this engages the database transaction logs, triggers, and referential integrity. If a row is rejected by referential integrity, you will know about it.LOAD
bypasses referential integrity, triggers, and database transaction logs, and puts the data directly into the table (though it cannot bypass unique constraints and identity constraints). The main reason why you want an exception table is to catch the data that doesn't load into the table (perhaps it was the wrong data type etc.)Because you bypass referential integrity, you MUST do a
SET INTEGRITY
afterwards to make the table aware of the data it now contains to see if anything does violate referential integrity (otherwise tables are put into aCHECK PENDING
state until you do so). (It is also a good idea to run aRUNSTATS
,REORG
,FLUSH PACKAGE CACHE DYNAMIC
, andREBIND
afterwards too to make sure that indexes and the optimizer know about the new data. Actually, it is a good idea to do that anyway if you do a massIMPORT
...but that aside...).Note that you can use the
MESSAGES
option with bothLOAD
andIMPORT
to capture errors that happen so you know how to deal with them (usually after the fact though). Note that if you use bothMESSAGES
for a message file andFOR EXCEPTION <table name>
for an exception table, any rows in error that get put in the exception table don't get listed in the message file.Here is a good comparison of LOAD vs IMPORT straight from IBM DB2 documentation (version 9.7)