If you look at the 2 execution plans, is there an easy answer to which is better? I purposefully did NOT create indexes so it's easier to see what's happening.
The second plan has a lower estimated cost, so in that limited sense it is 'better'.
The data sets are so small that the optimizer did not spend much time looking at alternatives. The first form of the query happens to find a plan using hash join and a table spool early on. The estimated cost of that plan is so low that the optimizer does not bother looking for anything better.
The second form of the query happens to find a plan using only nested loops outer joins early in the search process, and again the optimizer decides that plan is good enough. It so happens that this plan is estimated to be cheaper.
That said (as mentioned in the question comments) the two queries are not semantically identical. This may not be important to you if you can guarantee that the results will always be the same for all possible future states of your database, but the optimizer cannot make that assumption. It only ever produces plans that are guaranteed to produce the same results specified by the SQL, in all circumstances.
I have realized that the nested syntax also modifies the behaviour of the query.
The 'nested syntax' is just one aspect of the whole ANSI join syntax specification. To enable a full logical specification for more complex join patterns, the specification allows (optional) parentheses, and FROM
clause subqueries.
The query can be written using the same ANSI syntax using parentheses:
SELECT
A.*,
M.*,
N.*
FROM dbo.Autos AS A
LEFT JOIN
(
dbo.Manufacturers AS N
JOIN dbo.Models AS M
ON M.ManufacturerID = N.ManufacturerID
) ON M.ModelID = A.ModelID;
This form clearly shows that the logical requirement is to left join from Autos
to the result of inner joining Manufacturers
to Models
. Omitting the optional parentheses gives the form you call 'nested':
SELECT
A.*,
M.*,
N.*
FROM dbo.Autos AS A
LEFT JOIN dbo.Manufacturers AS N
JOIN dbo.Models AS M
ON M.ManufacturerID = N.ManufacturerID
ON M.ModelID = A.ModelID;
This is not a different syntax - it is just omitting optional parentheses and reformatting a bit.
As Martin mentioned, it is also possible in this case to express the logical requirement using inner joins followed by a right outer join:
SELECT
A.*,
M.*,
N.*
FROM dbo.Manufacturers AS N
JOIN dbo.Models AS M
ON M.ManufacturerID = N.ManufacturerID
RIGHT JOIN dbo.Autos AS A
ON A.ModelID = M.ModelID;
All three query forms above use the same ANSI join syntax. All three also happen to produce the same physical execution plan with the data set provided:
As I mentioned in my answer to your previous question, queries that express exactly the same logical requirement will not always produce the same execution plan. Which logical query form you prefer to use is largely a question of style. There is no correlation between one particular style and 'better' query plans in general. I would generally advise against rewriting a query to get a particular plan if the new query is not genuinely logically identical to the original.
The SQL standard also allows FROM
clause subqueries, so yet another way to write the same query specification is:
SELECT *
FROM dbo.Autos AS A
LEFT JOIN
(
SELECT
N.ManufacturerID,
ManufacturerName = N.Name,
M.ModelID,
ModelName = M.Name
FROM dbo.Manufacturers AS N
JOIN dbo.Models AS M
ON M.ManufacturerID = N.ManufacturerID
) AS R1
ON R1.ModelID = A.ModelID;
Using the traditional syntax, we have to change the join to `Manufacturers to an outer join, like so... but this changes the query plan.
This probably changes the meaning of the query, in which case it is technically not a valid alternative (but see ypercube's comment on your question).
The (optional) parentheses in the ANSI join syntax are there precisely for more complex join requirements like this, so you should not be afraid to use them where necessary.
They probably do this because storage allocation on the mainframe is much more complex than for unix or windows platforms.
zOS is basically a very modern sophisticated piece of harware and operating system that is emulating a 1960s mainframe. In much the same ways as the latest Intel chips are emulating a 1980s 386 chip.
The problem is that the very low level close to the wire disk I/O which was necessary to get high bandwidth and and quick access times on the original hardware is "baked in" to the OS and many applications. So all disk storage for legacy apps deals with channels and volumes. A volume is a "pretend" 3330 disk with 8GB or 32GB of storage usually allocated as a piece of a real nTerabyte raid system.
To define tables properly in zOS DB2 you need to create tables spaces which in turn requires you to allocate storage on specific volumes this is fairly messy as four or five entities need to be linked together TABLE -> TABLESPACE -> Volume Group -> Disk Volumes + Index -> IndexTablespace -> Volume Group -> Disk Volumes.
Basically your DBAs are saving you a lot of grief by not letting you play with the matches.
P.S. If you use the modern half of zOS (Unix Systems Services) this all becomes irrelevant but it's going to be a long slow migration!
Best Answer
You are right that for Nested Loop Join, the choice of which table is the inner and which the outer table matters for perforamnce.
However, there is nothing in the documentation, in the link you provided, that implies that for a query that has
a INNER JOIN b
, the tablea
will be used as inner andb
as outer table when the Nested Loop Join algorithm is selected.Any decent optimizer evaluates many different combinations of algorithms, placing of tables and order of execution, so I don't think there is any difference if you write
a INNER JOIN b
orb INNER JOIN a
, the chosen execution plans should be the same in both cases. If there are exceptions to this, I would expect them to be for very complex queries with tens of joined tables and/or multiple groupings.Testing and checking the actual execution plans is one way to confirm this. Another would be to analyze the source code of the query optimizer.
The general guidance when writing SQL (in whatever DBMS), is not to care at all about the table join orders. SQL code describes what you want as result, it doesn't tell the DBMS how to get it. And many optimizers now are really smarter and faster than most of us in choosing the best execution plan most of the time.
Unless documentation shows that the optimizer is naive or in a very early version and the way the queries are written, really affects the chosen execution plan.
Or testing/running of a specific query shows that it's slow and some obviously good plan was not chosen. Then you can experiment with hints (if the DBMS has such feature), try rewriting the query in different ways, check if statistics are updated, etc.