Disk page reads for Index Nested Loop Join

indexjoin;

I apologise if this is the wrong place to post this question.

In my Advanced Databases course I'm taking, we were taught that the number of disk page reads done when doing an index nested loop join when you have enough buffer space for 3 pages is:

Disk page reads for R INLJ S = |Pages(R)| + |R|.Depth(Index on S) + |R Join S|

I understand that we need to read in all the pages of R as the outer loop and then we have to traverse the depth of the B+ tree to get the leaf pages (and do this |R| times) but why do we need to add the number of times R joins with S?

Best Answer

I never took a class like that but I believe the idea is that it takes additional reads to fetch the data pages when you find matching rows through the index. Let's run a quick test in SQL Server to see if we get numbers close to your formula.

First I'll create tables and insert sample data into them. I want to test three different queries: one with 0 rows returned, one with 5000 rows returned, and one with 10000 rows returned.

-- create tables
CREATE TABLE X_OUTER_TABLE (NUM INT);
CREATE TABLE X_INNER_TABLE_0_MATCHES (NUM INT, FILLER VARCHAR(100));
CREATE INDEX IX_X_INNER_TABLE_0_MATCHES ON X_INNER_TABLE_0_MATCHES (NUM);

CREATE TABLE X_INNER_TABLE_5000_MATCHES (NUM INT, FILLER VARCHAR(100));
CREATE INDEX IX_X_INNER_TABLE_5000_MATCHES ON X_INNER_TABLE_5000_MATCHES (NUM);

CREATE TABLE X_INNER_TABLE_10000_MATCHES (NUM INT, FILLER VARCHAR(100));
CREATE INDEX IX_X_INNER_TABLE_10000_MATCHES ON X_INNER_TABLE_10000_MATCHES (NUM);

-- populate data
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
INSERT INTO X_OUTER_TABLE WITH (TABLOCK)
select N from cteTally where N BETWEEN 1 AND 10000;

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
INSERT INTO X_INNER_TABLE_0_MATCHES WITH (TABLOCK)
select N, REPLICATE('Z', 100) from cteTally where N BETWEEN 10001 AND 60000;

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
INSERT INTO X_INNER_TABLE_5000_MATCHES WITH (TABLOCK)
select N, REPLICATE('Z', 100) from cteTally where N BETWEEN 5001 AND 65000;

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
INSERT INTO X_INNER_TABLE_10000_MATCHES WITH (TABLOCK)
select N, REPLICATE('Z', 100) from cteTally where N BETWEEN 1 AND 50000;

I need to get some information about the index depth and the number of pages in the tables as well as do some additional setup to get cleaner numbers:

-- index depth of 2
SELECT INDEXPROPERTY ( object_ID('X_INNER_TABLE_10000_MATCHES'), 'IX_X_INNER_TABLE_10000_MATCHES' , 'IndexDepth');

 -- 19 used pages, 26 reserved
SELECT OBJECT_NAME(s.object_id) table_name,
        SUM(s.used_page_count) used_pages,
        SUM(s.reserved_page_count) reserved_pages
FROM    sys.dm_db_partition_stats s
WHERE OBJECT_NAME(s.object_id) IN ('X_OUTER_TABLE')
GROUP BY s.object_id;

-- get logical reads after query execution
SET STATISTICS IO ON;

-- Trace flag 8744: Disable pre-fetching for ranges
DBCC TRACEON(8744);

Based on the formula I expect the following results:

Join X_OUTER_TABLE to X_INNER_TABLE_0_MATCHES: 19 + 2 * 10000 + 0 = 20019

Join X_OUTER_TABLE to X_INNER_TABLE_5000_MATCHES: 19 + 2 * 10000 + 5000 = 25019

Join X_OUTER_TABLE to X_INNER_TABLE_10000_MATCHES: 19 + 2 * 10000 + 10000 = 30019

Run the SELECT queries:

SELECT INNER_TABLE.FILLER
FROM X_OUTER_TABLE OUTER_TABLE
INNER LOOP JOIN X_INNER_TABLE_0_MATCHES INNER_TABLE ON OUTER_TABLE.NUM = INNER_TABLE.NUM;

/*
(0 row(s) affected)
Table 'X_INNER_TABLE_0_MATCHES'. Scan count 10000, logical reads 20000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'X_OUTER_TABLE'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/


SELECT INNER_TABLE.FILLER
FROM X_OUTER_TABLE OUTER_TABLE
INNER LOOP JOIN X_INNER_TABLE_5000_MATCHES INNER_TABLE ON OUTER_TABLE.NUM = INNER_TABLE.NUM;

/*
(5000 row(s) affected)
Table 'X_INNER_TABLE_5000_MATCHES'. Scan count 10000, logical reads 25022, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'X_OUTER_TABLE'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/


SELECT INNER_TABLE.FILLER
FROM X_OUTER_TABLE OUTER_TABLE
INNER LOOP JOIN X_INNER_TABLE_10000_MATCHES INNER_TABLE ON OUTER_TABLE.NUM = INNER_TABLE.NUM;

/*
(10000 row(s) affected)
Table 'X_INNER_TABLE_10000_MATCHES'. Scan count 10000, logical reads 30044, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'X_OUTER_TABLE'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/

Actual results:

Join X_OUTER_TABLE to X_INNER_TABLE_0_MATCHES: 20017 (predicted 20019)

Join X_OUTER_TABLE to X_INNER_TABLE_5000_MATCHES: 25039 (predicted 25019)

Join X_OUTER_TABLE to X_INNER_TABLE_10000_MATCHES: 30061 (predicted 30019)

I think that's pretty close!

Clean up script for completeness:

DBCC TRACEOFF(8744);

DROP TABLE X_OUTER_TABLE;
DROP TABLE X_INNER_TABLE_0_MATCHES;
DROP TABLE X_INNER_TABLE_5000_MATCHES;
DROP TABLE X_INNER_TABLE_10000_MATCHES;