Oracle – Condition in WHERE vs Condition in CONNECT BY

hierarchyoraclewhere

Can anyone explain to me the difference between the following two queries? Even though they seem the same the results are different.

select concept.concept_id, concept.PARENT_ID 
from ebti_thes_concept_v concept
  start with PARENT_ID = '11025'
  connect by parent_id = prior concept_id
  and exists (..)

In the second one, the exists predicate is moved from the connect by clause to the where clause.

select concept.concept_id, concept.PARENT_ID
from ebti_thes_concept_v concept
where exists (..)
 start with PARENT_ID = '11025'
 connect by parent_id = prior concept_id;

Best Answer

After reading the documentation, I came to the following conclusions.

One difference between the two queries is that the first query has two condition in the connect by , in order to identify a relationship - a row must fulfill not only the parent_id = prior concept_id condition but also the exists clause.

 connect by parent_id = prior concept_id
 and exists (..)

While the second one has only one condition.

 connect by parent_id = prior concept_id;

The part of the query that makes the distinction more obvious is the start with.

In the first query, all the rows with PARENT_ID = '11025' will be returned and used as root rows. The two conditions of the connect by will be used to find the children of those rows. The exists condition will not be applied to the root rows.

On the other hand, the second query will fetch the rows with PARENT_ID = '11025' and filter them with the exists statement.

As a result, the results will be totally different.

Related Solutions

Oracle aggregate functions slows query down massively

Thanks for all your input - I seem to have solved it. I recalculated table statistics using:

SQL> EXEC DBMS_STATS.GATHER_TABLE_STATS(OWNNAME=>USER,TABNAME=>'RESULTS
_DATA');

After this, the queries worked fine.

Execution Difference Between JOIN Condition and WHERE Condition

According to Chapter 9 (Parser and Optimizer), Page 172 of the Book Understanding MySQL Internals by Sasha Pachev

Understanding MySQL Internals

here is the breakdown the evaluation of a query as the following tasks:

Determine which keys can be used to retrieve the records from tables, and choose the best one for each table.
For each table, decide whether a table scan is better that reading on a key. If there are a lot of records that match the key value, the advantages of the key are reduced and the table scan becomes faster.
Determine the order in which tables should be joined when more than one table is present in the query.
Rewrite the WHERE clauses to eliminate dead code, reducing the unnecessary computations and changing the constraints wherever possible to the open the way for using keys.
Eliminate unused tables from the join.
Determine whether keys can be used for ORDER BY and GROUP BY.
Attempt to simplify subqueries, as well as determine to what extent their results can be cached.
Merge views (expand the view reference as a macro)

On that same page, it says the following:

In MySQL optimizer terminology, every query is a set of joins. The term join is used here more broadly than in SQL commands. A query on only one table is a degenerate join. While we normally do not think of reading records from one table as a join, the same structures and algorithms used with conventional joins work perfectly to resolve the query with only one table.

EPILOGUE

Because of the keys present, the amount of data, and the expression of the query, MySQL Joins may sometimes do things for our own good (or to get back at us) and come up with results we did not expect and cannot quickly explain.

I wrote about this quirkiness before

Jan 23, 2013 : Problem with nested UPDATE queries
Feb 22, 2011 : Problem with MySQL subquery

because the MySQL Query Optimizer could make dismiss certain keys during the query's evaluation.

@Phil's comment help me see how to post this answer (+1 for @Phil's comment)

@ypercube's comment (+1 for this one too) is a compact version of my post because MySQL's Query Optimizer is primitive. Unfortunately, it has to be since it deals with outside storage engines.

CONCLUSION

As for your actual question, the MySQL Query Optimizer would determine the performance metrics of each query when it is done

counting rows
selecting keys
massaging intermittent results sets
Oh yeah, doing the actual JOIN

You would probably have to coerce the order of execution by rewriting (refactoring) the query

Here is the first Query you gave

select count(*)
from   table1 a
join   table2 b
on     b.key_col=a.key_col
where  b.tag = 'Y';

Try rewriting it to evaluate the WHERE first

select count(*)
from   table1 a
join   (select key_col from table2 where tag='Y') b
on     b.key_col=a.key_col;

That would definitely alter the EXPLAIN plan. It could produce better or worse results.

I once answered a question in StackOverflow where I applied this technique. The EXPLAIN was horrendous but the performance was dynamite. It only worked because of having the correct indexes present and the use of LIMIT in a subquery.

As with stock prices, when it comes to Queries and trying to express them, restrictions apply, results may vary, and past performance is not indicative of future results.