Say you have two relations R and S , where R has 1000 tuples and 100 page-accesses, and S has 50 tuples and 25 page-accesses.
Assuming R is the outer relation, then how many tuple-comparisons and page-accesses are done?
And how many page-accesses if R is the inner relation?
for each tuple r in R do
for each tuple s in S do
if r and s satisfy the join condition
then output the tuple (r,s)
So in order to find out how many tuple-comparisons are done, I need to do 1000 * 50 = 50000 because the algorithm is doing this "for each" tuple and we have in total 1000 tuples for R and 50 tuples for S, thus 50000 comparisons in total.
But how to know the page-accesses now? If R is outside, we have (1000 tuples) * (25 page-accesses for S) + (100 page-accesses for R) = 25100 page accesses?
And if R is inside, then: 50 * 100 + 25 = 5025 page accesses
I'm not sure if it is correct like that.. or how is this done correctly pleaseee? :/
Best Answer
We can force SQL Server to do exactly this and see what actually happens.
R has 1000 tuples and 100 page-accesses = 10 tuples/page = 806 bytes/tuple.
S has 50 tuples and 25 page-accesses = 2 tuples/page = 4030 bytes/tuple.
These are the tables:
Filler columns sizes have been rounded down to allow for row overhead. Note that there are no indexes. I have a "numbers" table which I'll use to populate R and S:
We can check just how many pages are involved:
The messages tab of SSMS shows
We have just the right number of pages. A bit of jiggery-pokery will get the behaviour you want to examine
Which gives this in the messages tab (inner table is listed before the outer table)
In SQL Server query execution proceeds row-wise. For each row in the outer table the corresponding row(s) in the inner table will be referenced. Since there are no indexes the only option is to read all the rows (i.e. all the pages) from the inner table every time. For R-join-S we have 1,000 outer rows times 25 inner pages giving 25,000 inner page references plus, of course, 100 outer page references. For S-join-R there are 50 rows times 100 pages giving 5,000 inner page references plus 25 outer page references.
In terms of tuple comparisons you are correct - there will be O(R)xO(S) comparisons - 50,000. This is supported by looking at the query plan. For both queries the "Number of Rows Read" for the inner table references are both 50,000.
If there are indexes the query optimizer (QO) has choices other than a table scan. Rewinds may be used for duplicate outer keys. No pages may be read for non-matching keys. In the extreme case where a constraint says there cannot be any matches the inner table may not even be referenced.