Sql-server – the most effective SELECTIVE XML index

indexperformancequery-performancesql serverxml

I am trying to determine the most effective selective XML index for an XML column within a table of 100 million rows. My question is similar to :

Sql Server 2012 Extended Events for Selective Xml Indexing not showing results

but the specific benefits of the selective index specified in the answer are not discussed in detail.

Below is a simplified version of the query:

;WITH CTE AS
( SELECT 1 AS ID,
              CONVERT(XML, '<Root>
                              <ParentTag ParentTagID="Sample Text">
                                <ChildTag1>5</ChildTag1>
                                <ChildTag1>6</ChildTag1>
                                <ChildTag1>7</ChildTag1>
                                <ChildTag2>8</ChildTag2>
                                <ChildTag2>9</ChildTag2>
                                <ChildTag2>10</ChildTag2>
                                <OtherTag>LargeIrrelevantData</OtherTag>
                              </ParentTag>
                            </Root>'
                      ) AS SampleXML
)
SELECT * INTO dbo.CTE FROM CTE

SELECT ID,
       Root.ParentTag.value('@ParentTagID','NVARCHAR(MAX)') AS ParentTagID,
       RootParentTag1.ChildTag1.value('(text())[1]', 'NVARCHAR(MAX)') AS ChildTag1,
       NULL
FROM CTE
OUTER APPLY CTE.SampleXML.nodes('/Root/ParentTag') as Root(ParentTag)
OUTER APPLY Root.ParentTag.nodes('ChildTag1') as RootParentTag1(ChildTag1)
UNION
SELECT ID,
       Root.ParentTag.value('@ParentTagID','NVARCHAR(MAX)') AS ParentTagID,
       NULL,
       RootParentTag2.ChildTag2.value('(text())[1]', 'NVARCHAR(MAX)') AS ChildTag2
FROM CTE
OUTER APPLY CTE.SampleXML.nodes('/Root/ParentTag') as Root(ParentTag)
OUTER APPLY Root.ParentTag.nodes('ChildTag2') as RootParentTag2(ChildTag2)

Is the most efficient index:

EXEC sp_db_selective_xml_index [DATABASE], TRUE
ALTER TABLE CTE ADD CONSTRAINT PK_CTE PRIMARY KEY CLUSTERED (ID)

CREATE SELECTIVE XML INDEX SIX_CTE ON CTE(SampleXML)
FOR
( ParentTag = '/Root/ParentTag' AS XQUERY 'node()',
  ParentTagChildTag1 = '/Root/ParentTag/ChildTag1' AS XQUERY 'node()',
  ParentTagChildTag2 = '/Root/ParentTag/ChildTag2' AS XQUERY 'node()',
  ParentTagID = '/Root/ParentTag/@ParentTagID' AS XQUERY 'xs:string' MAXLENGTH(255),
      ChildTag1 = '/Root/ParentTag/ChildTag1/text()' AS XQUERY 'xs:double',
      ChildTag2 = '/Root/ParentTag/ChildTag2/text()' AS XQUERY 'xs:double'
)

I do not want the LargeIrrelevantData, inside the <OtherTag>, to be indexed.

I'm using the following query to track indexed columns inside the XML

SELECT
a.name AS column_name,
c.name AS data_type
FROM sys.columns a
INNER JOIN sys.indexes b
ON a.object_id = b.object_id
INNER JOIN sys.types c
ON a.system_type_id = c.user_type_id
WHERE b.name = 'SIX_CTE'
AND b.type = 1
ORDER BY column_id

A few related questions:

How else can I determine the performance increase of my selective index besides reviewing query execution times?

Is it preferred to use XQUERY's xs:double or SQL's INT when specifying data types?

Thank you.

Best Answer

How else can I determine the performance increase of my selective index besides reviewing query execution times?

You could also look at IO utilization and CPU. Perhaps most interesting when dealing with selective XML indexes is to figure out if the index is actually used by the query. To do that you can have a look at the execution plan.

If you see one of the Table-valued functions responsible for parsing the XML then you know that you have not covered your XML query with a XML index.

What you should have instead is a scan or seek operator on the internal table used for the XML index.

Is it preferred to use XQUERY's xs:double or SQL's INT when specifying data types?

If you want the index to be used for the value() function you should match the data type in the index with the data type used in the query. You have nvarchar(max) in the query and xs:double in the index so ChildTag1 and ChildTag2 is not used in your query. It is instead parsing the XML data using the table valued functions.

To make your query use only the indexes (no xml parsing functions) you can change the query and the index to this.

create selective xml index SIX_CTE on CTE(SampleXML) for
(
  ParentTag = '/Root/ParentTag' as xquery 'node()',
  ParentTagID = '/Root/ParentTag/@ParentTagID' as sql nvarchar(255),
  ChildTag1 = '/Root/ParentTag/ChildTag1' as sql int,
  ChildTag2 = '/Root/ParentTag/ChildTag2' as sql int
);

Note that the node() indexes on ChildTag1 and ChildTag2 is not used by your query so not needed in the index.

SELECT ID,
       Root.ParentTag.value('@ParentTagID','NVARCHAR(255)') AS ParentTagID,
       RootParentTag1.ChildTag1.value('.', 'INT') AS ChildTag1,
       NULL
FROM CTE
OUTER APPLY CTE.SampleXML.nodes('/Root/ParentTag') as Root(ParentTag)
OUTER APPLY Root.ParentTag.nodes('ChildTag1') as RootParentTag1(ChildTag1)
UNION
SELECT ID,
       Root.ParentTag.value('@ParentTagID','NVARCHAR(255)') AS ParentTagID,
       NULL,
       RootParentTag2.ChildTag2.value('.', 'INT') AS ChildTag2
FROM CTE
OUTER APPLY CTE.SampleXML.nodes('/Root/ParentTag') as Root(ParentTag)
OUTER APPLY Root.ParentTag.nodes('ChildTag2') as RootParentTag2(ChildTag2);

What is the most effective SELECTIVE XML index?

Depends entirely on your use case and what the data looks like that you have. This answer is in no way an attempt to optimize your query, only a way to show you how to make use of the indexes. The result of testing on your data could show that you get the best performance by not using any indexes at all, or perhaps use the index only on some parts of the query. As a general rule it is always good to keep an index as small as possible so using int instead of float is a good thing if you know that you are dealing with integers.

Related Solutions

Sql-server – Selective Xml Indexing

A selective index will not be used when using query to retrieve the data.

From Selective XML Indexes (SXI) - Supported XML Features

Selective XML indexes support the XQuery supported by SQL Server inside the exist(), value() and nodes() methods.

For the query() and modify() methods, selective XML indexes may be used for node filtering only.

For the query() method, selective XML indexes are not used to retrieve results.

However testing shows that the index is used when you rewrite the query using nodes. These queries returns the same but the second is faster because it uses the the selective XML index.

select data.query('/Log/CallStack/ExceptionThread/Line')
from LogData


select T.X.query('.')
from LogData
  cross apply data.nodes('/Log/CallStack/ExceptionThread/Line') as T(X)

The selective XML index used in the second query is defined as..

[Line] = '/Log/CallStack/ExceptionThread/Line' as XQUERY 'node()',

I used a table of my own just to be able to show what the difference in this case is.

(67992 row(s) affected)
Table 'LogData'. Scan count 1, logical reads 388, physical reads 0, read-ahead reads 0, lob logical reads 1840865, lob physical reads 313514, lob read-ahead reads 24.

 SQL Server Execution Times:
   CPU time = 129792 ms,  elapsed time = 180087 ms.

(1850106 row(s) affected)
Table 'xml_sxi_table_622625261_256000'. Scan count 1, logical reads 8416, physical reads 3, read-ahead reads 8403, lob logical reads 0, lob physical reads 0, lob read-ahead reads 0.
Table 'LogData'. Scan count 1, logical reads 388, physical reads 0, read-ahead reads 0, lob logical reads 927882, lob physical reads 0, lob read-ahead reads 24.

 SQL Server Execution Times:
   CPU time = 65271 ms,  elapsed time = 67636 ms.

Sql-server – Sql Server 2012 Extended Events for Selective Xml Indexing not showing results

Not sure what you are seeing and not seeing in the event log but it works as expected for me (version 11.0.3000).

The mistake you did with your extra path expressions is that you forgot Stuff in the path.

Testing this with the extended events and looking at the execution plan for operator Table Valued Function XML Reader ... (there should be none), I found that one extra path is enough for your queries to make it use only the index to fetch the data [4] = '/Root/Stuff/Forms' as SQL nvarchar(25). You can have [3] = '/Root/Stuff/Forms' as XQUERY 'node()' in there as well and perhaps it will make a difference or not depending on what your XML look like.

Note 1: You have mixed up the names of the columns in the index and your query. That might contribute to the problems you have with extended event.

Note 2: The events for missing selective XML indexes is generated when the query is compiled. If the query plan is used from the cache you will not see any events.

Best Answer

Related Solutions

Sql-server – Selective Xml Indexing

Sql-server – Sql Server 2012 Extended Events for Selective Xml Indexing not showing results

Related Question