I asked this question on SO but havn't got many bites and even a vote to close for being "Off Topic." So, I thought this stack exchange might be better suited.
Consider the following SQL…
SELECT allStuff.id
FROM (SELECT id, 1 AS clazz_
FROM MyStuff
UNION ALL
SELECT id, 2 AS clazz_
FROM MyWarehousedStuff) allStuff
LEFT OUTER JOIN
Chilren child
ON allStuff.id = child.id
WHERE allStuff.id IN (/*fast sub select */)
AND allStuff.clazz_= ? /* if I remove this line the query is fast (index fast)
but if it stays in then DB2 does a tablescan
on MyStuff and MyWarehousedStuff (very expensive) */
To make things a bit more confusing, if I remove the LEFT OUTER JOIN
or make it an INNER JOIN
DB2 does the right thing. I find it highly unusuall that an AND would cause worse perforamnce. I am also surprised by the use of a table scan when the field in question isn't actually on the table and is constant.
Note, this is a cleaned up Hibernate query so I'm not to worried about fixing the SQL in question but rather I want to know why DB2 is acting a particular way.
What would be causing this behavior?
[update] – It appears that even adding allStuff.clazz_
to the column list is enough to trigger the table scanning.
[update] – Also seems that the IN
subslect is part of the problem. Switching that to a literal subselect IN (?, ?)
is fast but IN (select 1 from SYSIBM.SYSDUMMY1)
is not
[update] – Using DB2 v9.1 z/OS
[update] – To be clear, the following is also slow
SELECT allStuff.id, allStuff.clazz_
FROM (SELECT id, 1 AS clazz_
FROM MyStuff
UNION ALL
SELECT id, 2 AS clazz_
FROM MyWarehousedStuff) allStuff
LEFT OUTER JOIN
Chilren child
ON allStuff.id = child.id
WHERE allStuff.id IN (select 1 from sysibm.sysdummy1)
But this is fast…
SELECT allStuff.id, allStuff.clazz_
FROM (SELECT id, 1 AS clazz_
FROM MyStuff
UNION ALL
SELECT id, 2 AS clazz_
FROM MyWarehousedStuff) allStuff
LEFT OUTER JOIN
Chilren child
ON allStuff.id = child.id
WHERE allStuff.id IN (?, ?)
Fast…
SELECT allStuff.id, allStuff.clazz_
FROM (SELECT id, 1 AS clazz_
FROM MyStuff
UNION ALL
SELECT id, 2 AS clazz_
FROM MyWarehousedStuff) allStuff
INNER JOIN
Chilren child
ON allStuff.id = child.id
WHERE allStuff.id IN (?, ?)
Fast…
SELECT allStuff.id, allStuff.clazz_
FROM (SELECT id, 1 AS clazz_
FROM MyStuff
UNION ALL
SELECT id, 2 AS clazz_
FROM MyWarehousedStuff) allStuff
WHERE allStuff.id IN (select 1 from sysibm.sysdummy1)
Fast…
SELECT allStuff.id
FROM (SELECT id, 1 AS clazz_
FROM MyStuff
UNION ALL
SELECT id, 2 AS clazz_
FROM MyWarehousedStuff) allStuff
LEFT OUTER JOIN
Chilren child
ON allStuff.id = child.id
WHERE allStuff.id IN (select 1 from sysibm.sysdummy1)
Best Answer
Just off the top of my head, it makes sense that it needs to tablescan those two tables. They are both present in a
UNION ALL
. This essentially means all records are in consideration. You then also use aLEFT OUTER JOIN
meaning everything from the left side (your two union'd) tables even if it doesn't match the right side (your Children table). This is telling DB2 that no matter what every possible record in the two tables of your UNION ALL is under consideration.Now...to fix that problem. I would ask, do you have any indexes over the clazz column in both tables? Since clazz is a predicate of your where clause you may wish to have a index over that field in both tables (MyStuff and MyWarehouseStuff). You don't need an index over the id fields (as long as they are primary keys), because you gain that by default in DB2.
To better verify what DB2 is doing, you should use the visual explain (of course you may have done that). It should show you why it is picking what it is. A few other things to keep in mind when examining the explain, do you run stats? If you don't, you should. Running stats will keep DB2 better informed as to statistics about its tables, data, indexes, etc, and may cause it to choose a better access path. Also, this may seem a silly question, what version and fix pack of DB2 are you on? Occasionally certain versions/fix packs end up buggy and there are performance issues due to poor access path selection by the optimizer. An example of this, if you are on DB2 9.7 then you want to be on fix pack 4 or 5. There is at least one fix pack below those that had serious performance issues due to a bug in the optimizer.
Anyway, check out your indexes first. Adding one to both the tables I mentioned may solve your problem. Worse case, I suppose you could re-write your query to put the clazz predicate up into the sub select "table". That may cause the optimizer to eliminate rows before it unions and joins. Just another thought.