Db2 – Is SET INTEGRITY of any use after IMPORT statements, or just after LOAD statements

data integritydb2importload

I'm getting the feeling, both from documentation and empirically, that you can only put SET INTEGRITY to good use after a LOAD, not after an IMPORT. LOAD is mentioned for SET INTEGRITY, but not IMPORT.

If you try an IMPORT with INSERT_UPDATE, you may get some rows rejected, as you may with a LOAD. Whereas you can embed the use of an exception table directly in the syntax of the LOAD or in a SET INTEGRITY statement used directly after the LOAD statement, and the exception tables will "get" the rejected rows, but hat doesn't appear to be the case with IMPORT.

There's no ability to embed the use of the exception table in the IMPORT statement, and if you run the SET INTEGRITY statement after the IMPORT has rejected the rows, it's too late.

Am I right or wrong?

Best Answer

Short answer: You are right in that you cannot specify an exception table for the IMPORT command. Also, there is no need to run the SET INTEGRITY statement if you use IMPORT.

Here is why:

The biggest reason for SET INTEGRITY is based on how LOAD and IMPORT work. IMPORT uses normal INSERT statements to get the data into the tables. Doing this engages the database transaction logs, triggers, and referential integrity. If a row is rejected by referential integrity, you will know about it.

LOAD bypasses referential integrity, triggers, and database transaction logs, and puts the data directly into the table (though it cannot bypass unique constraints and identity constraints). The main reason why you want an exception table is to catch the data that doesn't load into the table (perhaps it was the wrong data type etc.)

Because you bypass referential integrity, you MUST do a SET INTEGRITY afterwards to make the table aware of the data it now contains to see if anything does violate referential integrity (otherwise tables are put into a CHECK PENDING state until you do so). (It is also a good idea to run a RUNSTATS, REORG, FLUSH PACKAGE CACHE DYNAMIC, and REBIND afterwards too to make sure that indexes and the optimizer know about the new data. Actually, it is a good idea to do that anyway if you do a mass IMPORT...but that aside...).

Note that you can use the MESSAGES option with both LOAD and IMPORT to capture errors that happen so you know how to deal with them (usually after the fact though). Note that if you use both MESSAGES for a message file and FOR EXCEPTION <table name> for an exception table, any rows in error that get put in the exception table don't get listed in the message file.

Here is a good comparison of LOAD vs IMPORT straight from IBM DB2 documentation (version 9.7)

SUGGESTIONS

SUGGESTION #1

My first suggestion for importing this rather large table would be

Drop all the non-unique indexes
Import the data
Create all the non-unique indexes

SUGGESTION #2

Get rid of duplicate indexes. In your case, you have

KEY `party_id` (`party_id`),
KEY `party_id_2` (`party_id`,`status`)

Both indexes start with party_id, you can increase secondary index processing by at least 7.6 % getting rid one index out of 13. You need to eventually run

ALTER TABLE monster DROP INDEX party_id;

SUGGESTION #3

Get rid of indexes you do not use. Look over your application code and see if your queries use all the indexes. You may want to look into pt-index-usage to let it suggest what indexes are not being used.

SUGGESTION #4

You should increase the innodb_log_buffer_size to 64M since the default is 8M. A bigger log buffer may increase InnoDB write I/O performance.

EPILOGUE

Putting the first two suggestions in place, do the following:

Drop the 13 non-unique indexes
Import the data
Create all the non-unique indexes except the party_id index

Perhaps the following may help

CREATE TABLE monster_new LIKE monster;
ALTER TABLE monster_new
  DROP INDEX `party_id`,
  DROP INDEX `creation_date`,
  DROP INDEX `email`,
  DROP INDEX `hash`,
  DROP INDEX `address_hash`,
  DROP INDEX `thumbs3`,
  DROP INDEX `ext_monster_id`,
  DROP INDEX `status`,
  DROP INDEX `note`,
  DROP INDEX `postcode`,
  DROP INDEX `some_id`,
  DROP INDEX `cookie`,
  DROP INDEX `party_id_2`;
ALTER TABLE monster RENAME monster_old;
ALTER TABLE monster_new RENAME monster;

Import the data into monster. Then, run this

ALTER TABLE monster
  ADD INDEX `creation_date`,
  ADD INDEX `email` (`email`(4)),
  ADD INDEX `hash` (`hash`(8)),
  ADD INDEX `address_hash` (`address_hash`(8)),
  ADD INDEX `thumbs3` (`thumbs3`),
  ADD INDEX `ext_monster_id` (`ext_monster_id`),
  ADD INDEX `status` (`status`),
  ADD INDEX `note` (`note`(4)),
  ADD INDEX `postcode` (`postcode`),
  ADD INDEX `some_id` (`some_id`),
  ADD INDEX `cookie` (`cookie`),
  ADD INDEX `party_id_2` (`party_id`,`status`);

GIVE IT A TRY !!!

ALTERNATIVE

You could create a table called monster_csv as a MyISAM table with no indexes and do this:

CREATE TABLE monster_csv ENGINE=MyISAM AS SELECT * FROM monster WHERE 1=2;
ALTER TABLE monster RENAME monster_old;
CREATE TABLE monster LIKE monster_old;
ALTER TABLE monster DROP INDEX `party_id`;

Import your data into monster_csv. Then, use mysqldump to create another import

mysqldump -t -uroot -p mydb monster_csv | sed 's/monster_csv/monster/g' > data.sql

The mysqldump file data.sql will extended INSERT commands importing 10,000-20,000 rows at a time.

Now, just load the mysqldump

mysql -uroot -p mydb < data.sql

Finally, get rid of the MyISAM table

DROP TABLE monster_csv;

Db2 – Use of SET INTEGRITY for DETACH PARTITION

The COMMIT is fairly obvious; it's committing the effects of the prior ALTER TABLE DETACH... statement (and any statements prior to that and the preceding COMMIT/ROLLBACK - transactions are implicitly started in DB2). I assume there's a bit more to that DETACH statement but that it's omitted for brevity.

The SET INTEGRITY statement (I'd strongly recommend reading that page, especially the Notes section) is more complex. In this case it's bringing the table out of check-pending status by telling the database engine that the table's data still conforms to all the constraints on the table, but that the engine shouldn't bother checking that itself. This is as opposed to SET INTEGRITY ... IMMEDIATE CHECKED in which the database engine would check the data is correct.

The table has presumably been left in check-pending state by some earlier operation (I don't think detaching partitions causes check-pending, although attaching them certainly does - see ALTER TABLE for more information).

As the Notes section of the SET INTEGRITY reference page states:

The fact that the table was taken out of the set integrity pending state without performing the required integrity processing will be recorded in the catalog (the respective byte in the CONST_CHECKED column in the SYSCAT.TABLES view will be set to 'U'). This indicates that the user has assumed responsibility for data integrity with respect to the specific constraints...

I'd strongly recommend reading the rest of the Notes section for more info on using IMMEDIATE UNCHECKED.