Sql-server – XML column performance in TSQL

sql server 2014sql-server-2016t-sqlxml

I'm finding very limited reading on how a large xml column affects table performance in sql server 2014/6. For reference, it is a 36gig table and minus one xml column, it is 36megs. Some queries call for the column (usually single row requests), most do not. Is there a significant performance benefit to breaking that xml column out to its own linked table?

Best Answer

The database engine is pretty smart about this and does not read LOB pages unless it has to as far as I can tell, but I created a test-rig to double-check.

Test Rig

USE master
GO

SET NOCOUNT ON
GO

CREATE DATABASE bigXMLTest
GO
ALTER DATABASE bigXMLTest SET RECOVERY SIMPLE
GO
ALTER DATABASE bigXMLTest MODIFY FILE ( NAME = N'bigXMLTest', SIZE = 20GB , FILEGROWTH = 128MB )
GO
ALTER DATABASE bigXMLTest MODIFY FILE ( NAME = N'bigXMLTest_log', SIZE = 512MB )
GO


USE bigXMLTest
GO

IF OBJECT_ID('dbo.largeTable') IS NOT NULL DROP TABLE dbo.largeTable
GO

CREATE TABLE dbo.largeTable (

    rowId INT IDENTITY PRIMARY KEY,

    someDate    DATETIME DEFAULT GETDATE(),
    someData    UNIQUEIDENTIFIER DEFAULT NEWID(),
    someXML     XML NOT NULL
)
GO


-- Add dummy data
;WITH cte AS (
SELECT TOP 100000 ROW_NUMBER() OVER ( ORDER BY ( SELECT 1 ) ) rn
FROM master.sys.columns c1
    CROSS JOIN master.sys.columns c2
    CROSS JOIN master.sys.columns c3
)
INSERT INTO dbo.largeTable ( someDate, someData, someXML )
SELECT
    DATEADD( day, rn % 333, '1 Jan 2016' ), 
    NEWID(), 
    ( SELECT TOP 100 * FROM sys.messages WHERE language_id = 1033 FOR XML PATH, ROOT('root'), TYPE )    -- creates 24k size piece of XML on my machine
FROM cte

CHECKPOINT
GO 10

EXEC sp_spaceused 'dbo.largeTable'
GO

This creates a test table with 1 million rows, the non-XML data around 36MB and the XML data about 23GB.

Test Queries

--DROP INDEX _udx ON dbo.largeTable

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT *
FROM dbo.largeTable

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT rowId, someDate, someData    --, someXML
FROM dbo.largeTable

-- Add covering index for the query
CREATE UNIQUE INDEX _udx ON dbo.largeTable ( rowId ) INCLUDE ( someDate, someData )

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT *
FROM dbo.largeTable

DBCC FREEPROCCACHE
DBCC DROPCLEANBUFFERS

SELECT rowId, someDate, someData    --, someXML
FROM dbo.largeTable

Test Results

Analysis

Query 1: A select *... causes a full table scan, so the database engine does a high number of LOB reads to read the XML which is stored out of row in LOB pages. XML can be stored in-row if it's small enough, but the default is out-of-row. This query takes 5 minutes, ie if you are reading LOB pages on a table like this, you will know about it.

Query 2 A select with column list not including the XML column, has the same number of logical reads in the top window (11,154), but no LOB reads. The query takes less than a second.

Query 3: After the non-clustered index has been added, a select * still causes a full table scan as expected. This query behaves the same as Query 1.

Query 4: After the non-clustered index has been added, it covers the select with column list query so there is a non-clustered index scan with no LOB reads and lower logical reads overall.

The LOB Reads looks like a broad categorisation as the XML storage pages are classed as TEXT_MIX_PAGE in the sys.dm_db_database_page_allocations DMV:

In summary, it looks like there would be little benefit to splitting the same as the database engine can differentiate when and when not to perform LOB reads in this admittedly simple test rig. See here for some further details.

Related Solutions

Sql-server – Datetime column vs Date + Time columns

I would use DATETIME2 as opposed to DATETIME for any new development work at this point. If you don't need sub-second granularity, use DATETIME2(0) and you'll see some space savings as well.

From here I would start experimenting. If performance is poor just querying off the single column, I'd then try adding persisted computed columns to separate out the date and time components. This will ensure that the separated DATE and TIME columns will always be the same as the "master" DATETIME2(n) column.

Sql-server – Is the eager spool operator useful for this delete from a clustered columnstore

Is there any plausible benefit to the spool iterator in the first plan?

This depends on what you regard as "plausible", but the answer according to the cost model is yes. Of course this is true, because the optimizer always chooses the cheapest plan it finds.

The real question is why the cost model considers the plan with the spool so much cheaper than the plan without. Consider estimated plans created for a fresh table (from your script) before any rows have been added to the delta store:

DELETE Fact.RecordedMetricsDetail
WHERE MeasurementTime < DATEADD(day,-1,GETUTCDATE())
OPTION (RECOMPILE);

The estimated cost for this plan is a huge 771,734 units:

The cost is almost all associated with the Clustered Index Delete, because the deletions are expected to result in a great deal of random I/O. This is just the generic logic that applies to all data modifications. For example, an unordered set of modifications to a b-tree index are assumed to result in largely random I/O, with an associated high I/O cost.

Data-changing plans may feature a Sort to present rows in an order that will promote sequential access, for exactly these cost reasons. The impact is exacerbated in this case because the table is partitioned. Very partitioned, in fact; your script creates 15,000 of them. Random updates to a very partitioned table are costed especially high since the price to switch partitions (rowsets) mid-stream is given a high cost as well.

The last major factor to consider is that the simple update query above (where 'update' means any data-changing operation, including a delete) qualifies for an optimization called "rowset sharing", where the same internal rowset is used for both scanning and updating the table. The execution plan still shows two separate operators, but nevertheless, there is only one rowset used.

I mention this because being able to apply this optimization means the optimizer takes a code path that simply does not consider the potential benefits of explicitly sorting to reduce the cost of random I/O. Where the table is a b-tree, this makes sense, because the structure is inherently ordered, so sharing the rowset provides all the potential benefits automatically.

The important consequence is that the costing logic for the update operator does not consider this ordering benefit (promoting sequential I/O or other optimizations) where the underlying object is column store. This is because column store modifications are not performed in-place; they use a delta store. The cost model is therefore reflecting a difference between shared-rowset updates on b-trees versus columnstores.

Nevertheless, in the special case of a (very!) partitioned columnstore, there might still be a benefit to preserved ordering, in that performing all updates to one partition before moving to the next might still be advantageous from an I/O point of view.

The standard cost logic is reused for column stores here, so a plan that preserves partition ordering (though not order within each partition) is costed lower. We can see this on the test query by using undocumented trace flag 2332 to require sorted input to the update operator. This sets the DMLRequestSort property to true at the update, and forces the optimizer to produce a plan that provides all rows for one partition before moving to the next:

DELETE Fact.RecordedMetricsDetail
WHERE MeasurementTime < DATEADD(day,-1,GETUTCDATE())
OPTION (RECOMPILE, QUERYTRACEON 2332);

The estimated cost for this plan is very much lower, at 52.5174 units:

This reduction in cost is all due to the lower estimated I/O cost at the update. The introduced Spool performs no useful function, except it can guarantee output in partition order, as required by the update with DMLRequestSort = true (the serial scan of a column store index cannot provide this guarantee). The cost of the spool itself is considered to be relatively low, especially compared with the (probably unrealistic) reduction in cost at the update.

The decision about whether to require ordered input to the update operator is made very early on in query optimization. The heuristics used in this decision have never been documented, but can be determined through trial and error. It seems that the size of any delta stores is an input to this decision. Once made, the choice is permanent for the query compilation. No USE PLANhint will succeed: the target of the plan either has ordered input to the update, or it does not.

There is another way to get a low-cost plan for this query without artificially limiting the cardinality estimate. A sufficiently low estimate to avoid the Spool will probably result in DMLRequestSort being false, resulting in a very high estimated plan cost due to the expected random I/O. An alternative is to use trace flag 8649 (parallel plan) in conjunction with 2332 (DMLRequestSort = true):

DELETE Fact.RecordedMetricsDetail
WHERE MeasurementTime < DATEADD(day,-1,GETUTCDATE())
OPTION (RECOMPILE, QUERYTRACEON 2332, QUERYTRACEON 8649);

This results in a plan that uses per-partition batch-mode parallel scan and an order-preserving (merging) Gather Streams exchange:

Depending on the run-time effectiveness of partition ordering on your hardware, this may perform best of the three. That said,large modifications are not a great idea on column store, so the partition-switching idea is almost certainly better. If you can cope with the long compilation times and quirky plan choices often seen with partitioned objects - especially when the number of partitions is large.

Combining many, relatively new, features, especially near their limits, is a great way to obtain poor execution plans. The depth of optimizer support tends to improve over time, but using 15,000 partitions of column store will likely always mean you live in interesting times.