Db2 – INSERT INTO with SELECT subquery db2 9.5

db2db2-9.5performancequery-performance

I have the following query that takes 4 seconds to execute:

SELECT TX.* FROM TABLE_X TX
  INNER JOIN TABLE_Y TY
    ON TY.VALUE_ID = VALUE_ID_IN
      AND TY.PROD_ID = TX.PROD_ID
WHERE (TX.PLACE_ID1 = VAR_PLACE_ID OR TX.PLACE_ID2 = PLACE_ID)
  AND (TX.DATE > TY.FIRST_DATE)
  AND (TX.TYPE_ID != VAR_TYPE_ID1_IN AND TX.TYPE_ID != VAR_TYPE_ID2_IN)
  AND (TX.COL1 != COL1_IN OR TX.COL2 != COL2_IN)

However, I want to store it in a TEMPORARY TABLE. When I execute the following query it takes 50 seconds:

INSERT INTO SESSION.TEMP_TABLE_X   /*new line to store result*/
    SELECT TX.* FROM TABLE_X TX
      INNER JOIN TABLE_Y TY
        ON TY.VALUE_ID = VALUE_ID_IN
          AND TY.PROD_ID = TX.PROD_ID
    WHERE (TX.PLACE_ID1 = VAR_PLACE_ID OR TX.PLACE_ID2 = PLACE_ID)
      AND (TX.DATE > TY.FIRST_DATE)
      AND (TX.TYPE_ID != VAR_TYPE_ID1_IN AND TX.TYPE_ID != VAR_TYPE_ID2_IN)
      AND (TX.COL1 != COL1_IN OR TX.COL2 != COL2_IN)

TABLE_X = 300 000 rows

TABLE_Y = 90 000 rows

How can I reduce the 50 seconds running time?

The server is running DB2 v9.5.

Best Answer

I'll put this as an answer as it is easier to comment on.

Generally you need to run RUNSTATS on every table (I'd vote for at least once a week). This tells the optimizer the number of records (and if specified, which I do recommend) the distribution of data across the records. With this information the optimizer can make better decisions regarding access paths.

The optimizer also uses packages as well. REBINDING the packages essentially alters the packages (ie, their access paths) based on the RUNSTATS found.

In your case you could run RUNSTATS on the tables you care about

RUNSTATS ON TABLE <schema>.<table> WITH DISTRIBUTION AND DETAILED INDEXES ALL

If you have a lot of tables, then I'd recommend hunting for them dynamically

--do note syntax is for Unix, having to worry about escape characters and what not.
db2 -tnx  "select 'runstats on table ' || 
ltrim(rtrim(tabschema))||'.'||ltrim(rtrim(tabname))
|| ' with distribution and detailed indexes all ;'
from syscat.tables where type = 'T' and ownertype = 'U' and 
tabschema not in ('SYSCAT','SYSIBM','SYSIBMADM','SYSPUBLIC','SYSSTAT','SYSTOOLS') 
order by tabschema asc" >> runstats.sql

You can then read this in to your database and execute it

db2 -tvf runstats.sql

REBINDs I find the easiest to do using the db2rbind utility. It beats having to know all the specific package names. Since we are a small shop I just tend to use the following syntax each time.

db2rbind <my db> -l rebind.log all -r any

If there are errors, you will see it in the log.

So generally you would want to issue the RUNSTATS first, then issue a COMMIT to release any locks. Then you can run the REBINDs through db2rbind. If you have not run these in a while you may wish to throw a db2 flush package cache dynamic in between the RUNSTATS and the db2rbind.

Related Solutions

SQL Server 2005 – INSERT Performance: Temporary Tables vs Table Variables

The obvious difference between the two plans is that the fast one is parallel and the slower one serial.

This is one of the limitations of plans that insert into table variables. As mentioned in the comments (and it seems as though it had the desired effect) you could try doing

INSERT INTO @DATA ( ... ) 
EXEC('SELECT .. FROM ...')

to see if that gets around the limitation.

Mysql – Does MySQL have problems with nested insert like with subqueries in where

Using OR in the JOIN part of your query results in the query to be performed for all 4 million records returned from the result set of the left hand side OR clause.

So using a solution with UNION is definitely faster.

Best Answer

Related Solutions

SQL Server 2005 – INSERT Performance: Temporary Tables vs Table Variables

Mysql – Does MySQL have problems with nested insert like with subqueries in where

Related Question