In Craig Freedman's blog, Nested Loops Join, he explains why the nested loops join cannot support a right outer join:
The problem is that we scan the inner table multiple times – once for
each row of the outer join. We may encounter the same inner rows
multiple times during these multiple scans.
At what point can we conclude that a particular inner row has not or will not join?
Can someone please explain this in a really simple and educational way?
Does it mean that the loop starts with the outer table (R1
) and the scans the inner (R2
)?
I understand that for a R1
value that doesn't join with R2
, it should be replaced with a NULL
so the result set becomes (NULL, R2
). For me it seems impossible to return an R2
value when R1
does not join, for the reason that it cannot know which R2
value to return. But that's not the way it's explained. Or is it?
SQL Server does in fact optimize (and often replaces) RIGHT JOIN
with LEFT JOIN
, but the question is to explain why it's technically impossible for a NESTED LOOPS JOIN
to use/support RIGHT JOIN
logic.
Best Answer
The main issue here is the implementation of an outer join, using nested loops, in a technical way which is opposite to the logical way, where the inner table is accessed through the outer loop and the outer table is accessed through the inner loop.
Given tables A and B, let's implement
A LEFT JOIN B
.First, let's do it in the "natural" way.
We iterate through A.
We access record 1.
We iterate through B.
We find record 1 in B and output 1-1.
We keep iterating through A.
We access record 2.
We iterate through B.
We don't find any match in B.
We output 2-null.
Now, let's do it in the "opposite" way.
We iterate through B.
We access record 1.
We iterate through A.
We find record 1 in A and output 1-1.
We keep iterating through B.
We access record 3.
We iterate through A.
We don't find any match in A.
Now remember that it was
A LEFT JOIN B
, which means that in addition to 1-1 we should output 2-null.The problem is that at that point, we have no idea for which records id A we already have a match (1) and for which records we don't (2).
This can actually be solved in various ways e.g. by holding a bit array for table A.
When an A record is being found as a match we mark it in the bit array.
At the end of the nested loops we are going through the bit array and output and output any record that was not marked.
This is obviously more complicated than the "natural" nested loop.