Sql-server – If a VARCHAR(MAX) column is included in an index, is the entire value always stored in the index page(s)

sql servervarchar

I'm asking this out of curiosity, being inspired by this question.

We know that VARCHAR(MAX) values longer than 8000 bytes are not stored in rows, but in separate LOB pages. Subsequently retrieving a row with such value requires two or more logical IO operations (essentially, one more than otherwise would theoretically be necessary).

We can add a VARCHAR(MAX) column, as INCLUDEd, to a unique index, as demonstrated in the linked question. If this column has values that exceed 8000 bytes in length, would such values still be stored "inline" in the index leaf pages, or would they also be moved to LOB pages?

Best Answer

Values that exceed 8000 bytes cannot be stored "inline". They are stored on LOB pages. You can see this with sys.dm_db_index_physical_stats. Start with a simple table:

USE tempdb;

DROP TABLE IF EXISTS #LOB_FOR_ME;

CREATE TABLE #LOB_FOR_ME (
ID BIGINT,
MAX_VERNON_WAS_HERE VARCHAR(MAX) 
);

CREATE INDEX IX ON #LOB_FOR_ME (ID) INCLUDE (MAX_VERNON_WAS_HERE);

Now insert some rows with values that take 8000 bytes for the VARCHAR(MAX) column and check out the DMF:

USE tempdb;

INSERT INTO #LOB_FOR_ME
SELECT 1, REPLICATE('Z', 8000)
FROM master..spt_values;

SELECT index_level, index_type_desc, alloc_unit_type_desc, page_count, record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('#LOB_FOR_ME'), 2, NULL , 'DETAILED'); 

There are no LOB pages in the index:

╔═════════════╦════════════════════╦══════════════════════╦════════════╦══════════════╗
║ index_level ║  index_type_desc   ║ alloc_unit_type_desc ║ page_count ║ record_count ║
╠═════════════╬════════════════════╬══════════════════════╬════════════╬══════════════╣
║           0 ║ NONCLUSTERED INDEX ║ IN_ROW_DATA          ║       2540 ║         2540 ║
║           1 ║ NONCLUSTERED INDEX ║ IN_ROW_DATA          ║         18 ║         2540 ║
║           2 ║ NONCLUSTERED INDEX ║ IN_ROW_DATA          ║          1 ║           18 ║
╚═════════════╩════════════════════╩══════════════════════╩════════════╩══════════════╝

But if I add rows with values that take 8001 bytes:

USE tempdb;

INSERT INTO #LOB_FOR_ME
SELECT 2, REPLICATE(CAST('Z' AS VARCHAR(MAX)), 8001)
FROM master..spt_values;

SELECT index_level, index_type_desc, alloc_unit_type_desc, page_count, record_count
FROM sys.dm_db_index_physical_stats(DB_ID(), OBJECT_ID('#LOB_FOR_ME'), 2, NULL , 'DETAILED'); 

Now I have 1 LOB page in the index for every row that I just inserted:

╔═════════════╦════════════════════╦══════════════════════╦════════════╦══════════════╗
║ index_level ║  index_type_desc   ║ alloc_unit_type_desc ║ page_count ║ record_count ║
╠═════════════╬════════════════════╬══════════════════════╬════════════╬══════════════╣
║           0 ║ NONCLUSTERED INDEX ║ IN_ROW_DATA          ║       2556 ║         5080 ║
║           1 ║ NONCLUSTERED INDEX ║ IN_ROW_DATA          ║         18 ║         2556 ║
║           2 ║ NONCLUSTERED INDEX ║ IN_ROW_DATA          ║          1 ║           18 ║
║           0 ║ NONCLUSTERED INDEX ║ LOB_DATA             ║       2540 ║         2540 ║
╚═════════════╩════════════════════╩══════════════════════╩════════════╩══════════════╝

You can also see this with SET STATISTICS IO ON; and the right query. Consider the following query that only looks at rows with 8000 bytes:

SELECT SUM(LEN(MAX_VERNON_WAS_HERE))
FROM #LOB_FOR_ME
WHERE ID = 1;

Results upon executing:

Scan count 1, logical reads 2560, physical reads 0, read-ahead reads 0, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

If I instead query the rows with 8001 bytes:

SELECT SUM(LEN(MAX_VERNON_WAS_HERE))
FROM #LOB_FOR_ME
WHERE ID = 2;

Now I see lob reads:

Scan count 1, logical reads 20, physical reads 0, read-ahead reads 0, lob logical reads 5080, lob physical reads 0, lob read-ahead reads 0.