Sql-server – the most effective SELECTIVE XML index

indexperformancequery-performancesql serverxml

I am trying to determine the most effective selective XML index for an XML column within a table of 100 million rows. My question is similar to :

Sql Server 2012 Extended Events for Selective Xml Indexing not showing results

but the specific benefits of the selective index specified in the answer are not discussed in detail.

Below is a simplified version of the query:

;WITH CTE AS
( SELECT 1 AS ID,
              CONVERT(XML, '<Root>
                              <ParentTag ParentTagID="Sample Text">
                                <ChildTag1>5</ChildTag1>
                                <ChildTag1>6</ChildTag1>
                                <ChildTag1>7</ChildTag1>
                                <ChildTag2>8</ChildTag2>
                                <ChildTag2>9</ChildTag2>
                                <ChildTag2>10</ChildTag2>
                                <OtherTag>LargeIrrelevantData</OtherTag>
                              </ParentTag>
                            </Root>'
                      ) AS SampleXML
)
SELECT * INTO dbo.CTE FROM CTE

SELECT ID,
       Root.ParentTag.value('@ParentTagID','NVARCHAR(MAX)') AS ParentTagID,
       RootParentTag1.ChildTag1.value('(text())[1]', 'NVARCHAR(MAX)') AS ChildTag1,
       NULL
FROM CTE
OUTER APPLY CTE.SampleXML.nodes('/Root/ParentTag') as Root(ParentTag)
OUTER APPLY Root.ParentTag.nodes('ChildTag1') as RootParentTag1(ChildTag1)
UNION
SELECT ID,
       Root.ParentTag.value('@ParentTagID','NVARCHAR(MAX)') AS ParentTagID,
       NULL,
       RootParentTag2.ChildTag2.value('(text())[1]', 'NVARCHAR(MAX)') AS ChildTag2
FROM CTE
OUTER APPLY CTE.SampleXML.nodes('/Root/ParentTag') as Root(ParentTag)
OUTER APPLY Root.ParentTag.nodes('ChildTag2') as RootParentTag2(ChildTag2)

Is the most efficient index:

EXEC sp_db_selective_xml_index [DATABASE], TRUE
ALTER TABLE CTE ADD CONSTRAINT PK_CTE PRIMARY KEY CLUSTERED (ID)

CREATE SELECTIVE XML INDEX SIX_CTE ON CTE(SampleXML)
FOR
( ParentTag = '/Root/ParentTag' AS XQUERY 'node()',
  ParentTagChildTag1 = '/Root/ParentTag/ChildTag1' AS XQUERY 'node()',
  ParentTagChildTag2 = '/Root/ParentTag/ChildTag2' AS XQUERY 'node()',
  ParentTagID = '/Root/ParentTag/@ParentTagID' AS XQUERY 'xs:string' MAXLENGTH(255),
      ChildTag1 = '/Root/ParentTag/ChildTag1/text()' AS XQUERY 'xs:double',
      ChildTag2 = '/Root/ParentTag/ChildTag2/text()' AS XQUERY 'xs:double'
)

I do not want the LargeIrrelevantData, inside the <OtherTag>, to be indexed.

I'm using the following query to track indexed columns inside the XML

SELECT
a.name AS column_name,
c.name AS data_type
FROM sys.columns a
INNER JOIN sys.indexes b
ON a.object_id = b.object_id
INNER JOIN sys.types c
ON a.system_type_id = c.user_type_id
WHERE b.name = 'SIX_CTE'
AND b.type = 1
ORDER BY column_id

A few related questions:

How else can I determine the performance increase of my selective index besides reviewing query execution times?

Is it preferred to use XQUERY's xs:double or SQL's INT when specifying data types?

Thank you.

Best Answer

How else can I determine the performance increase of my selective index besides reviewing query execution times?

You could also look at IO utilization and CPU. Perhaps most interesting when dealing with selective XML indexes is to figure out if the index is actually used by the query. To do that you can have a look at the execution plan.

If you see one of the Table-valued functions responsible for parsing the XML then you know that you have not covered your XML query with a XML index.

enter image description here

What you should have instead is a scan or seek operator on the internal table used for the XML index.

enter image description here

Is it preferred to use XQUERY's xs:double or SQL's INT when specifying data types?

If you want the index to be used for the value() function you should match the data type in the index with the data type used in the query. You have nvarchar(max) in the query and xs:double in the index so ChildTag1 and ChildTag2 is not used in your query. It is instead parsing the XML data using the table valued functions.

To make your query use only the indexes (no xml parsing functions) you can change the query and the index to this.

create selective xml index SIX_CTE on CTE(SampleXML) for
(
  ParentTag = '/Root/ParentTag' as xquery 'node()',
  ParentTagID = '/Root/ParentTag/@ParentTagID' as sql nvarchar(255),
  ChildTag1 = '/Root/ParentTag/ChildTag1' as sql int,
  ChildTag2 = '/Root/ParentTag/ChildTag2' as sql int
);

Note that the node() indexes on ChildTag1 and ChildTag2 is not used by your query so not needed in the index.

SELECT ID,
       Root.ParentTag.value('@ParentTagID','NVARCHAR(255)') AS ParentTagID,
       RootParentTag1.ChildTag1.value('.', 'INT') AS ChildTag1,
       NULL
FROM CTE
OUTER APPLY CTE.SampleXML.nodes('/Root/ParentTag') as Root(ParentTag)
OUTER APPLY Root.ParentTag.nodes('ChildTag1') as RootParentTag1(ChildTag1)
UNION
SELECT ID,
       Root.ParentTag.value('@ParentTagID','NVARCHAR(255)') AS ParentTagID,
       NULL,
       RootParentTag2.ChildTag2.value('.', 'INT') AS ChildTag2
FROM CTE
OUTER APPLY CTE.SampleXML.nodes('/Root/ParentTag') as Root(ParentTag)
OUTER APPLY Root.ParentTag.nodes('ChildTag2') as RootParentTag2(ChildTag2);

What is the most effective SELECTIVE XML index?

Depends entirely on your use case and what the data looks like that you have. This answer is in no way an attempt to optimize your query, only a way to show you how to make use of the indexes. The result of testing on your data could show that you get the best performance by not using any indexes at all, or perhaps use the index only on some parts of the query. As a general rule it is always good to keep an index as small as possible so using int instead of float is a good thing if you know that you are dealing with integers.