Sql-server – XML column performance in TSQL

sql server 2014sql-server-2016t-sqlxml

I'm finding very limited reading on how a large xml column affects table performance in sql server 2014/6. For reference, it is a 36gig table and minus one xml column, it is 36megs. Some queries call for the column (usually single row requests), most do not. Is there a significant performance benefit to breaking that xml column out to its own linked table?

Best Answer

The database engine is pretty smart about this and does not read LOB pages unless it has to as far as I can tell, but I created a test-rig to double-check.

Test Rig

USE master
GO

SET NOCOUNT ON
GO

CREATE DATABASE bigXMLTest
GO
ALTER DATABASE bigXMLTest SET RECOVERY SIMPLE
GO
ALTER DATABASE bigXMLTest MODIFY FILE ( NAME = N'bigXMLTest', SIZE = 20GB , FILEGROWTH = 128MB )
GO
ALTER DATABASE bigXMLTest MODIFY FILE ( NAME = N'bigXMLTest_log', SIZE = 512MB )
GO


USE bigXMLTest
GO

IF OBJECT_ID('dbo.largeTable') IS NOT NULL DROP TABLE dbo.largeTable
GO

CREATE TABLE dbo.largeTable (

    rowId INT IDENTITY PRIMARY KEY,

    someDate    DATETIME DEFAULT GETDATE(),
    someData    UNIQUEIDENTIFIER DEFAULT NEWID(),
    someXML     XML NOT NULL
)
GO


-- Add dummy data
;WITH cte AS (
SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
    CROSS JOIN master.sys.columns c2
    CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.largeTable ( someDate, someData, someXML )
SELECT
    DATEADD( day, rn % 333, '1 Jan 2016' ), 
    NEWID(), 
    ( SELECT TOP 100 * FROM sys.messages WHERE language_id = 1033 FOR XML PATH, ROOT('root'), TYPE )    -- creates 24k size piece of XML on my machine
FROM cte

CHECKPOINT
GO 10

EXEC sp_spaceused 'dbo.largeTable'
GO

This creates a test table with 1 million rows, the non-XML data around 36MB and the XML data about 23GB.

Test Queries

--DROP INDEX _udx ON dbo.largeTable

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT *
FROM dbo.largeTable

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT rowId, someDate, someData    --, someXML
FROM dbo.largeTable

-- Add covering index for the query
CREATE UNIQUE INDEX _udx ON dbo.largeTable ( rowId ) INCLUDE ( someDate, someData )

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT *
FROM dbo.largeTable

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT rowId, someDate, someData    --, someXML
FROM dbo.largeTable

Test Results

Test results

Analysis

Query 1: A select *... causes a full table scan, so the database engine does a high number of LOB reads to read the XML which is stored out of row in LOB pages. XML can be stored in-row if it's small enough, but the default is out-of-row. This query takes 5 minutes, ie if you are reading LOB pages on a table like this, you will know about it.

Query 2 A select with column list not including the XML column, has the same number of logical reads in the top window (11,154), but no LOB reads. The query takes less than a second.

Query 3: After the non-clustered index has been added, a select * still causes a full table scan as expected. This query behaves the same as Query 1.

Query 4: After the non-clustered index has been added, it covers the select with column list query so there is a non-clustered index scan with no LOB reads and lower logical reads overall.

The LOB Reads looks like a broad categorisation as the XML storage pages are classed as TEXT_MIX_PAGE in the sys.dm_db_database_page_allocations DMV:

DMV results

In summary, it looks like there would be little benefit to splitting the same as the database engine can differentiate when and when not to perform LOB reads in this admittedly simple test rig. See here for some further details.