Sql-server – Why does nested loops join only support left joins

database-internalsjoin;sql server

In Craig Freedman's blog, Nested Loops Join, he explains why the nested loops join cannot support a right outer join:

The problem is that we scan the inner table multiple times – once for
each row of the outer join. We may encounter the same inner rows
multiple times during these multiple scans.
At what point can we conclude that a particular inner row has not or will not join?

Can someone please explain this in a really simple and educational way?

Does it mean that the loop starts with the outer table (R1) and the scans the inner (R2)?

I understand that for a R1 value that doesn't join with R2, it should be replaced with a NULL so the result set becomes (NULL, R2). For me it seems impossible to return an R2 value when R1 does not join, for the reason that it cannot know which R2 value to return. But that's not the way it's explained. Or is it?

SQL Server does in fact optimize (and often replaces) RIGHT JOIN with LEFT JOIN, but the question is to explain why it's technically impossible for a NESTED LOOPS JOIN to use/support RIGHT JOIN logic.

Best Answer

The main issue here is the implementation of an outer join, using nested loops, in a technical way which is opposite to the logical way, where the inner table is accessed through the outer loop and the outer table is accessed through the inner loop.

Given tables A and B, let's implement A LEFT JOIN B.

A
--
1
2

B
_
1
3

First, let's do it in the "natural" way.

We iterate through A.
We access record 1.
We iterate through B.
We find record 1 in B and output 1-1.

We keep iterating through A.
We access record 2.
We iterate through B.
We don't find any match in B.
We output 2-null.

Now, let's do it in the "opposite" way.

We iterate through B.
We access record 1.
We iterate through A.
We find record 1 in A and output 1-1.

We keep iterating through B.
We access record 3.
We iterate through A.
We don't find any match in A.

Now remember that it was A LEFT JOIN B, which means that in addition to 1-1 we should output 2-null.
The problem is that at that point, we have no idea for which records id A we already have a match (1) and for which records we don't (2).


This can actually be solved in various ways e.g. by holding a bit array for table A.
When an A record is being found as a match we mark it in the bit array.
At the end of the nested loops we are going through the bit array and output and output any record that was not marked.
This is obviously more complicated than the "natural" nested loop.