Sql-server – Logical reads different when accessing the same LOB data

blobdatabase-internalsperformancesql server

Here are three simple tests that read the same data, yet report very different logical reads:

Setup

The following script creates a test table with 100 identical rows, each containing an xml column with enough data to ensure it is stored off row. In my test database, the length of the xml generated is 20,204 bytes for each row.

-- Conditional drop
IF OBJECT_ID(N'dbo.XMLTest', N'U') IS NOT NULL
    DROP TABLE dbo.XMLTest;
GO
-- Create test table
CREATE TABLE dbo.XMLTest
(
    ID integer IDENTITY PRIMARY KEY,
    X xml NULL
);
GO
-- Add 100 wide xml rows
DECLARE @X xml;

SET @X =
(
    SELECT TOP (100) *
    FROM  sys.columns AS C
    FOR XML 
        PATH ('row'),
        ROOT ('root'),
        TYPE
);

INSERT dbo.XMLTest
    (X)
SELECT TOP (100)
    @X
FROM  sys.columns AS C;

-- Flush dirty buffers
CHECKPOINT;

Tests

The following three tests read the xml column with:

  1. A plain SELECT statement
  2. Assigning the xml to a variable
  3. Using SELECT INTO to create a temporary table
-- No row count messages or graphical plan
-- Show I/O statistics
SET NOCOUNT ON;
SET STATISTICS XML OFF;
SET STATISTICS IO ON;
GO
PRINT CHAR(10) + '=== Plain SELECT ===='

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SELECT XT.X 
FROM dbo.XMLTest AS XT;
GO
PRINT CHAR(10) + '=== Assign to a variable ===='

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

DECLARE @X xml;

SELECT
    @X = XT.X
FROM dbo.XMLTest AS XT;
GO
PRINT CHAR(10) + '=== SELECT INTO ===='

IF OBJECT_ID(N'tempdb..#T', N'U') IS NOT NULL
    DROP TABLE #T;

DBCC DROPCLEANBUFFERS WITH NO_INFOMSGS;

SELECT 
    XT.X
INTO #T
FROM dbo.XMLTest AS XT
GO
SET STATISTICS IO OFF;

Results

The output is:

=== Plain SELECT ====
Table 'XMLTest'. Scan count 1, logical reads 3, physical reads 1, read-ahead reads 0, 
    lob logical reads 795, lob physical reads 37, lob read-ahead reads 796.

=== Assign to a variable ====
Table 'XMLTest'. Scan count 1, logical reads 3, physical reads 1, read-ahead reads 0, 
    lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.

=== SELECT INTO ====
Table 'XMLTest'. Scan count 1, logical reads 3, physical reads 1, read-ahead reads 0, 
    lob logical reads 300, lob physical reads 37, lob read-ahead reads 400.

Questions

  • Why are the LOB reads so different?
  • Surely the exact same data was read in each test?

Best Answer

Not all reads are equal. SQL Server knows that accessing LOB data is expensive, and tries to avoid it when possible. There are also detailed differences in the way the LOB data is read in each case:

Summary

The numbers are different because:

  • The select reads the LOB in packet-sized chunks
  • The variable assignment test does not read the LOB at all
  • The "select into" test reads the LOB in whole pages

Detail

  1. Plain SELECT

    Select plan

    The Clustered Index Scan does not read any LOB data. It only assigns a storage engine LOB handle. The handle isn't used until control returns to the root of the plan.

    The current row's LOB content is read in TDS packet sized chunks and streamed to the client. Logical reads count the number of times a page is touched, so:

    The number of reads reported equals the number of chunked reads performed, plus one for each time a LOB page transition occurs.

    For example: A logical read is counted at the start of each chunk as the process touches the page corresponding to the current position of the stream. Where packets are smaller than a database page (the usual case), several logical reads are counted for the same page. If the packet size were so large that the entire LOB could fit in one chunk, the number of logical reads reported would be the number of LOB pages.

  2. Variable assignment

    Variable plan

    The Clustered Index Scan assigns a LOB handle as before. At the root of the plan, the LOB handle is copied to the variable. The LOB data itself is never accessed (zero LOB reads), because the variable is never read. Even if it were, it would only be via the LOB handle last assigned.

    There are no LOB reads because the LOB data is never accessed.

  3. SELECT INTO

    Select Into Plan

    This plan uses the bulk rowset provider to copy the LOB data from the source table to the new table. It processes a complete LOB page on each read (no streaming or chunking).

    The number of logical reads corresponds to the number of LOB pages in the test table.