Db2 – Table orders regarding Nested loop join in DB2

db2join;

First I would like to cite the existing post Table order in Join Query

so the author is saying that the table order really doesn't make things(result, access plan)different. I have doubt on this regarding the access plan.

so the documentation of DB2 (TABLE JOIN) tells in the NLJoin, outer table is fixed and scanned only once, while the inner tables might be scanned multiple times depend on different scenarios, under this circumstance, I think the order of tables in the join really matters, we need to put big table as outer table, as we only want to scan this once.

is this the case, is this only for DB2? did I miss something?

+is there general guidance when writing SQL, how should we deal with the table join orders ?

Best Answer

You are right that for Nested Loop Join, the choice of which table is the inner and which the outer table matters for perforamnce.

However, there is nothing in the documentation, in the link you provided, that implies that for a query that has a INNER JOIN b, the table a will be used as inner and b as outer table when the Nested Loop Join algorithm is selected.

Any decent optimizer evaluates many different combinations of algorithms, placing of tables and order of execution, so I don't think there is any difference if you write a INNER JOIN b or b INNER JOIN a, the chosen execution plans should be the same in both cases. If there are exceptions to this, I would expect them to be for very complex queries with tens of joined tables and/or multiple groupings.

Testing and checking the actual execution plans is one way to confirm this. Another would be to analyze the source code of the query optimizer.


The general guidance when writing SQL (in whatever DBMS), is not to care at all about the table join orders. SQL code describes what you want as result, it doesn't tell the DBMS how to get it. And many optimizers now are really smarter and faster than most of us in choosing the best execution plan most of the time.

Unless documentation shows that the optimizer is naive or in a very early version and the way the queries are written, really affects the chosen execution plan.

Or testing/running of a specific query shows that it's slow and some obviously good plan was not chosen. Then you can experiment with hints (if the DBMS has such feature), try rewriting the query in different ways, check if statistics are updated, etc.