Disk page reads for Index Nested Loop Join

indexjoin;

I apologise if this is the wrong place to post this question.

In my Advanced Databases course I'm taking, we were taught that the number of disk page reads done when doing an index nested loop join when you have enough buffer space for 3 pages is:

Disk page reads for R INLJ S = |Pages(R)| + |R|.Depth(Index on S) + |R Join S|

I understand that we need to read in all the pages of R as the outer loop and then we have to traverse the depth of the B+ tree to get the leaf pages (and do this |R| times) but why do we need to add the number of times R joins with S?

Best Answer

I never took a class like that but I believe the idea is that it takes additional reads to fetch the data pages when you find matching rows through the index. Let's run a quick test in SQL Server to see if we get numbers close to your formula.

First I'll create tables and insert sample data into them. I want to test three different queries: one with 0 rows returned, one with 5000 rows returned, and one with 10000 rows returned.

-- create tables
CREATE TABLE X_OUTER_TABLE (NUM INT);
CREATE TABLE X_INNER_TABLE_0_MATCHES (NUM INT, FILLER VARCHAR(100));
CREATE INDEX IX_X_INNER_TABLE_0_MATCHES ON X_INNER_TABLE_0_MATCHES (NUM);

CREATE TABLE X_INNER_TABLE_5000_MATCHES (NUM INT, FILLER VARCHAR(100));
CREATE INDEX IX_X_INNER_TABLE_5000_MATCHES ON X_INNER_TABLE_5000_MATCHES (NUM);

CREATE TABLE X_INNER_TABLE_10000_MATCHES (NUM INT, FILLER VARCHAR(100));
CREATE INDEX IX_X_INNER_TABLE_10000_MATCHES ON X_INNER_TABLE_10000_MATCHES (NUM);

-- populate data
WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
INSERT INTO X_OUTER_TABLE WITH (TABLOCK)
select N from cteTally where N BETWEEN 1 AND 10000;

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
INSERT INTO X_INNER_TABLE_0_MATCHES WITH (TABLOCK)
select N, REPLICATE('Z', 100) from cteTally where N BETWEEN 10001 AND 60000;

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
INSERT INTO X_INNER_TABLE_5000_MATCHES WITH (TABLOCK)
select N, REPLICATE('Z', 100) from cteTally where N BETWEEN 5001 AND 65000;

WITH E00(N) AS (SELECT 1 UNION ALL SELECT 1),
        E02(N) AS (SELECT 1 FROM E00 a, E00 b),
        E04(N) AS (SELECT 1 FROM E02 a, E02 b),
        E08(N) AS (SELECT 1 FROM E04 a, E04 b),
        E16(N) AS (SELECT 1 FROM E08 a, E08 b),
        E32(N) AS (SELECT 1 FROM E16 a, E16 b),
   cteTally(N) AS (SELECT ROW_NUMBER() OVER (ORDER BY N) FROM E32)
INSERT INTO X_INNER_TABLE_10000_MATCHES WITH (TABLOCK)
select N, REPLICATE('Z', 100) from cteTally where N BETWEEN 1 AND 50000;

I need to get some information about the index depth and the number of pages in the tables as well as do some additional setup to get cleaner numbers:

-- index depth of 2
SELECT INDEXPROPERTY ( object_ID('X_INNER_TABLE_10000_MATCHES'), 'IX_X_INNER_TABLE_10000_MATCHES' , 'IndexDepth');

 -- 19 used pages, 26 reserved
SELECT OBJECT_NAME(s.object_id) table_name,
        SUM(s.used_page_count) used_pages,
        SUM(s.reserved_page_count) reserved_pages
FROM    sys.dm_db_partition_stats s
WHERE OBJECT_NAME(s.object_id) IN ('X_OUTER_TABLE')
GROUP BY s.object_id;

-- get logical reads after query execution
SET STATISTICS IO ON;

-- Trace flag 8744: Disable pre-fetching for ranges
DBCC TRACEON(8744);

Based on the formula I expect the following results:

Join X_OUTER_TABLE to X_INNER_TABLE_0_MATCHES: 19 + 2 * 10000 + 0 = 20019

Join X_OUTER_TABLE to X_INNER_TABLE_5000_MATCHES: 19 + 2 * 10000 + 5000 = 25019

Join X_OUTER_TABLE to X_INNER_TABLE_10000_MATCHES: 19 + 2 * 10000 + 10000 = 30019

Run the SELECT queries:

SELECT INNER_TABLE.FILLER
FROM X_OUTER_TABLE OUTER_TABLE
INNER LOOP JOIN X_INNER_TABLE_0_MATCHES INNER_TABLE ON OUTER_TABLE.NUM = INNER_TABLE.NUM;

/*
(0 row(s) affected)
Table 'X_INNER_TABLE_0_MATCHES'. Scan count 10000, logical reads 20000, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'X_OUTER_TABLE'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/


SELECT INNER_TABLE.FILLER
FROM X_OUTER_TABLE OUTER_TABLE
INNER LOOP JOIN X_INNER_TABLE_5000_MATCHES INNER_TABLE ON OUTER_TABLE.NUM = INNER_TABLE.NUM;

/*
(5000 row(s) affected)
Table 'X_INNER_TABLE_5000_MATCHES'. Scan count 10000, logical reads 25022, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'X_OUTER_TABLE'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/


SELECT INNER_TABLE.FILLER
FROM X_OUTER_TABLE OUTER_TABLE
INNER LOOP JOIN X_INNER_TABLE_10000_MATCHES INNER_TABLE ON OUTER_TABLE.NUM = INNER_TABLE.NUM;

/*
(10000 row(s) affected)
Table 'X_INNER_TABLE_10000_MATCHES'. Scan count 10000, logical reads 30044, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'X_OUTER_TABLE'. Scan count 1, logical reads 17, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
*/

Actual results:

Join X_OUTER_TABLE to X_INNER_TABLE_0_MATCHES: 20017 (predicted 20019)

Join X_OUTER_TABLE to X_INNER_TABLE_5000_MATCHES: 25039 (predicted 25019)

Join X_OUTER_TABLE to X_INNER_TABLE_10000_MATCHES: 30061 (predicted 30019)

I think that's pretty close!

Clean up script for completeness:

DBCC TRACEOFF(8744);

DROP TABLE X_OUTER_TABLE;
DROP TABLE X_INNER_TABLE_0_MATCHES;
DROP TABLE X_INNER_TABLE_5000_MATCHES;
DROP TABLE X_INNER_TABLE_10000_MATCHES;

Related Solutions

Sql-server – Aren’t two writes required to update a clustered index record

I think the problem here is a difference in terminology.

The "number of writes" that's usually referred to is the number of object accesses, rather than the number of pages that get touched by the physical operation.

The reason why that's usually used as a metric in discussion is because it's a more "stable" and meaningful number to talk about. As we're getting into here, the number of pages touched by an INSERT statement for even a single row depends on many factors, so it's not a very useful quantity outside your own environment and situation.

The one thing I would pick at from the article quote is this (emphasis mine):

One write for inserting the row, and one write for updating the non-clustered index.

This may be confusing. Inserting a row into the base table would involve an insert to the base table, and also an insert into each nonclustered index (ignoring special index features), not an update.

So if a record has to be updated, say the value 1 has to be updated to 7, won't the update need to be applied to both the key in the clustered index top node (this may, in cases, cause a re-structuring of the entire structure) and the corresponding value in the record in the leaf-page?

Yes, assuming the column that was updated is in the index key. However, this is still a single object access, and hence a "single write."

Sql-server – Index not making execution faster, and in some cases is slowing down the query. Why is it so

Even though the index is suggested by the SQL Server, why does it slow things down by a significant difference?

Index suggestions are made by the query optimizer. If it comes across a logical selection from a table which is not well served by an existing index, it may add a "missing index" suggestion to its output. These suggestions are opportunistic; they are not based on a full analysis of the query, and do not take account of wider considerations. At best, they are an indication that more helpful indexing may be possible, and a skilled DBA should take a look.

The other thing to say about missing index suggestions is that they are based on the optimizer's costing model, and the optimizer estimates by how much the suggested index might reduce the estimated cost of the query. The key words here are "model" and "estimates". The query optimizer knows little about your hardware configuration or other system configuration options - its model is largely based on fixed numbers that happen to produce reasonable plan outcomes for most people on most systems most of the time. Aside from issues with the exact cost numbers used, the results are always estimates - and estimates can be wrong.

What is the Nested Loop join which is taking most of the time and how to improve its execution time?

There is little to be done to improve the performance of the cross join operation itself; nested loops is the only physical implementation possible for a cross join. The table spool on the inner side of the join is an optimization to avoid rescanning the inner side for each outer row. Whether this is a useful performance optimization depends on various factors, but in my tests the query is better off without it. Again, this is a consequence of using a cost model - my CPU and memory system likely has different performance characteristics than yours. There is no specific query hint to avoid the table spool, but there is an undocumented trace flag (8690) that you can use to test execution performance with and without the spool. If this were a real production system problem, the plan without the spool could be forced using a plan guide based on the plan produced with TF 8690 enabled. Using undocumented trace flags in production is not advised because the installation becomes technically unsupported and trace flags can have undesirable side-effects.

Is there something that I am doing wrong or have missed?

The main thing you are missing is that although the plan using the nonclustered index has a lower estimated cost according to the optimizer's model, it has a significant execution-time problem. If you look at the distribution of rows across threads in the plan using the Clustered Index, you will likely see a reasonably good distribution:

Scan plan

In the plan using the Nonclustered Index Seek, the work ends up being performed entirely by one thread:

Seek plan

This is a consequence of the way work is distributed among threads by parallel scan/seek operations. It is not always the case that a parallel scan will distribute work better than an index seek - but it does in this case. More complex plans might include repartitioning exchanges to redistribute work across threads. This plan has no such exchanges, so once rows are assigned to a thread, all related work is performed on that same thread. If you look at the work distribution for the other operators in the execution plan, you will see that all work is performed by the same thread as shown for the index seek.

There are no query hints to affect row distribution among threads, the important thing is to be aware of the possibility and to be able to read enough detail in the execution plan to determine when it is causing a problem.

With the default index (on primary key only) why does it take less time, and with the non clustered index present, for each row in the joining table, the joined table row should be found quicker, because join is on Name column on which the index has been created. This is reflected in the query execution plan and Index Seek cost is less when IndexA is active, but why still slower? Also what is in the Nested Loop left outer join that is causing the slowdown?

It should now be clear that the nonclustered index plan is potentially more efficient, as you would expect; it is just poor distribution of work across threads at execution time that accounts for the performance issue.

For the sake of completing the example and illustrating some of the things I have mentioned, one way to get a better work distribution is to use a temporary table to drive parallel execution:

SELECT
    val1,
    val2
INTO #Temp
FROM dbo.IndexTestTable AS ITT
WHERE Name = N'Name1';

SELECT 
    N'Name1',
    SUM(T.val1),
    SUM(T.val2),
    MIN(I2.Name),
    SUM(I2.val1),
    SUM(I2.val2)
FROM   #Temp AS T
CROSS JOIN IndexTestTable I2
WHERE
    I2.Name = 'Name1'
OPTION (FORCE ORDER, QUERYTRACEON 8690);

DROP TABLE #Temp;

This results in a plan that uses the more efficient index seeks, does not feature a table spool, and distributes work across threads well:

Optimal plan

On my system, this plan executes significantly faster than the Clustered Index Scan version.

If you're interested in learning more about the internals of parallel query execution, you might like to watch my PASS Summit 2013 session recording.

Best Answer

Related Solutions

Sql-server – Aren’t two writes required to update a clustered index record

Sql-server – Index not making execution faster, and in some cases is slowing down the query. Why is it so

Related Question