Sql-server – Xml index, slow count of rows

execution-planindexperformancequery-performancesql serverxml

I am not getting index seek in the following scenario.
Instead of inserting <Number>xxx</Number> in xml column like in this post Why is the secondary selective index not used when the where clause filters on `value()`?, insert 100k rows with this xml <SomeText>NiceText</SomeText> and similar amount of rows of this <SomeText>MoreText</SomeText>. Doesnt need to be 100k. Just needs to be many.
Then add index

create selective xml index SIX_T on dbo.T(XMLDoc) for
(
    pathXQUERY = '/SomeText' as xquery 'xs:string' maxlength(8) singleton
);

And secondary index

create xml index SIX_T_pathXQUERY on dbo.T(XMLDoc)
  using xml index SIX_T for (pathXQUERY);

Then do a count

select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1;

Notice that it doesnt use index seek and is "slow". Can take several seconds with millions of rows.
If i insert same values in a standard column and add an index to it and do a

select count(id)
    from dbo.T as T where SomeTextColumn = 'MoreText'

i get results instantly.
All tests done on sql server 18.3.1

The question is, how can i make counting by xml as fast as counting by column?

Thanks

Best Answer

Data differences

With a low amount of records to seek on, the optimizer is able to use the SIX_T_pathXQUERY index:

enter image description here

and filter on moretext with a seek predicate:

enter image description here

An interesting part here is that it executes the key lookup to get the path_1_id values that are not null.

enter image description here

Since that is a filter definition on the nonclustered xml index...

enter image description here

...while not being present in the index itself.

As a result, for the optimizer to consider using the index it knows that it has to complete these steps:

  • Filter on the XML value with the secondary index on the internal table
  • Match these returned values with the SIX_T clustered index on the internal table and filter on path_1_id is not null since path_1_id is not included in the secondary index
  • Match these values with the actual table, dbo.T to return the ID to count on

Tipping points

When the expected rows to be returned are higher, it favours using the selective clustered XML Index, to be able to run a merge join & no key lookup instead:

enter image description here

with a residual predicate:

enter image description here

To filter on the xml column and path_1_id

Comparing the plans

You could (but you shouldn't) use the USE PLAN hint to force the plan with the seek on the XML column and see what would happen if we where to search on these values.

enter image description here

With the execution time =

   CPU time = 218 ms,  elapsed time = 215 ms.

And the execution time for the scan + merge join plan:

   CPU time = 62 ms,  elapsed time = 58 ms.

In short, I believe that the scan with residual predicate + merge join choice was the right choice by the optimizer.

No way around it

While I might be wrong, I don't think there is a way to improve the count query with regular XML Indexes. We also cannot change these internal tables or even query them:

SELECT * FROM 
[sys].[xml_sxi_table_1463676262_256000];

It even gives a missing index hint on the merge join query plan, which you cannot create:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [sys].[xml_sxi_table_1463676262_256000] ([pathXQUERY_1_value])
INCLUDE ([path_1_id])

To improve the queries, you would have to look into non-xml index solutions.


Edit

How do i use USE PLAN hint? You have example code for this query?

I would advise against this, but for testing purposes it would be fine.

Step 1: You would have to get the actual execution plan of the query and get the xml:

enter image description here

If you don't have the query with the low estimates, enter a value that does not exist like:

select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "bbb"]') = 1

Step 2: Replace all 's with ''s in the execution plan xml, we will need this later.

enter image description here

Step 3: Paste the plan between OPTION( USE PLAN '')

SELECT count(*)
FROM dbo.T as T
WHERE T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN 
'');

Step 4: I had to change utf-16 to utf-8 to get the use plan hint to work

From:

OPTION(USE PLAN
'<?xml version="1.0" encoding="utf-16"?>

To:

OPTION(USE PLAN
'<?xml version="1.0" encoding="utf-8"?>

Step 5: Run the query.

My query now looks like this:

select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN
'<?xml version="1.0" encoding="utf-8"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.481" Build="14.0.3223.3" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="1" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" CardinalityEstimationModelVersion="70" StatementSubTreeCost="0.00986014" StatementText="select count(*)&#xD;&#xA;from dbo.T as T&#xD;&#xA;where T.XMLDoc.exist(''/SomeText[. eq &quot;bbb&quot;]'') = 1" StatementType="SELECT" QueryHash="0x412154B6AD55BBFC" QueryPlanHash="0x280B174BF20902E3" RetrievedFromCache="true" SecurityPolicyApplied="false">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan DegreeOfParallelism="1" CachedPlanSize="40" CompileTime="394" CompileCPU="301" CompileMemory="648">
            <MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
            <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="419404" EstimatedPagesCached="52425" EstimatedAvailableDegreeOfParallelism="2" MaxCompileMemory="3655432" />
            <QueryTimeStats CpuTime="0" ElapsedTime="0" />
            <RelOp AvgRowSize="11" EstimateCPU="0" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.00986014">
              <OutputList>
                <ColumnReference Column="Expr1017" />
              </OutputList>
              <ComputeScalar>
                <DefinedValues>
                  <DefinedValue>
                    <ColumnReference Column="Expr1017" />
                    <ScalarOperator ScalarString="CONVERT_IMPLICIT(int,[Expr1020],0)">
                      <Convert DataType="int" Style="0" Implicit="true">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Column="Expr1020" />
                          </Identifier>
                        </ScalarOperator>
                      </Convert>
                    </ScalarOperator>
                  </DefinedValue>
                </DefinedValues>
                <RelOp AvgRowSize="11" EstimateCPU="1.1E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Aggregate" NodeId="1" Parallel="false" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="0.00986014">
                  <OutputList>
                    <ColumnReference Column="Expr1020" />
                  </OutputList>
                  <RunTimeInformation>
                    <RunTimeCountersPerThread Thread="0" ActualRows="1" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
                  </RunTimeInformation>
                  <StreamAggregate>
                    <DefinedValues>
                      <DefinedValue>
                        <ColumnReference Column="Expr1020" />
                        <ScalarOperator ScalarString="Count(*)">
                          <Aggregate AggType="countstar" Distinct="false" />
                        </ScalarOperator>
                      </DefinedValue>
                    </DefinedValues>
                    <RelOp AvgRowSize="9" EstimateCPU="1E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Aggregate" NodeId="3" Parallel="false" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="0.00985904">
                      <OutputList />
                      <RunTimeInformation>
                        <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
                      </RunTimeInformation>
                      <StreamAggregate>
                        <DefinedValues />
                        <GroupBy>
                          <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                        </GroupBy>
                        <RelOp AvgRowSize="11" EstimateCPU="4.18E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Inner Join" NodeId="4" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.00985804">
                          <OutputList>
                            <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                          </OutputList>
                          <RunTimeInformation>
                            <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
                          </RunTimeInformation>
                          <NestedLoops Optimized="false">
                            <OuterReferences>
                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                            </OuterReferences>
                            <RelOp AvgRowSize="16" EstimateCPU="4.18E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Inner Join" NodeId="5" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.00657038">
                              <OutputList>
                                <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                              </OutputList>
                              <RunTimeInformation>
                                <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
                              </RunTimeInformation>
                              <NestedLoops Optimized="false">
                                <OuterReferences>
                                  <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                  <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                </OuterReferences>
                                <RelOp AvgRowSize="15" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" EstimatedRowsRead="1" LogicalOp="Index Seek" NodeId="6" Parallel="false" PhysicalOp="Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="150002">
                                  <OutputList>
                                    <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                    <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                  </OutputList>
                                  <RunTimeInformation>
                                    <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="1" ActualLogicalReads="3" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
                                  </RunTimeInformation>
                                  <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                                    <DefinedValues>
                                      <DefinedValue>
                                        <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                      </DefinedValue>
                                      <DefinedValue>
                                        <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                      </DefinedValue>
                                    </DefinedValues>
                                    <Object Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Index="[SIX_T_pathXQUERY]" Filtered="true" Alias="[SomeText:1]" IndexKind="SecondarySelectiveXML" Storage="RowStore" />
                                    <SeekPredicates>
                                      <SeekPredicateNew>
                                        <SeekKeys>
                                          <Prefix ScanType="EQ">
                                            <RangeColumns>
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pathXQUERY_1_value" />
                                            </RangeColumns>
                                            <RangeExpressions>
                                              <ScalarOperator ScalarString="N''bbb''">
                                                <Const ConstValue="N''bbb''" />
                                              </ScalarOperator>
                                            </RangeExpressions>
                                          </Prefix>
                                        </SeekKeys>
                                      </SeekPredicateNew>
                                    </SeekPredicates>
                                  </IndexScan>
                                </RelOp>
                                <RelOp AvgRowSize="461" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Clustered Index Seek" NodeId="8" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="150002">
                                  <OutputList />
                                  <RunTimeInformation>
                                    <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="0" ActualExecutions="0" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="0" ActualLogicalReads="0" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
                                  </RunTimeInformation>
                                  <IndexScan Lookup="true" Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                                    <DefinedValues />
                                    <Object Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Index="[SIX_T]" Alias="[SomeText:1]" TableReferenceId="-1" IndexKind="SelectiveXML" Storage="RowStore" />
                                    <SeekPredicates>
                                      <SeekPredicateNew>
                                        <SeekKeys>
                                          <Prefix ScanType="EQ">
                                            <RangeColumns>
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                            </RangeColumns>
                                            <RangeExpressions>
                                              <ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[pk1] as [SomeText:1].[pk1]">
                                                <Identifier>
                                                  <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                                </Identifier>
                                              </ScalarOperator>
                                              <ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[row_id] as [SomeText:1].[row_id]">
                                                <Identifier>
                                                  <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                                </Identifier>
                                              </ScalarOperator>
                                            </RangeExpressions>
                                          </Prefix>
                                        </SeekKeys>
                                      </SeekPredicateNew>
                                    </SeekPredicates>
                                    <Predicate>
                                      <ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[path_1_id] as [SomeText:1].[path_1_id] IS NOT NULL">
                                        <Logical Operation="IS NOT NULL">
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="path_1_id" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </Logical>
                                      </ScalarOperator>
                                    </Predicate>
                                  </IndexScan>
                                </RelOp>
                              </NestedLoops>
                            </RelOp>
                            <RelOp AvgRowSize="11" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" EstimatedRowsRead="1" LogicalOp="Clustered Index Seek" NodeId="9" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="200002">
                              <OutputList>
                                <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                              </OutputList>
                              <RunTimeInformation>
                                <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="0" ActualExecutions="0" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="0" ActualLogicalReads="0" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
                              </RunTimeInformation>
                              <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                                <DefinedValues>
                                  <DefinedValue>
                                    <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                                  </DefinedValue>
                                </DefinedValues>
                                <Object Database="[adventureworks]" Schema="[dbo]" Table="[T]" Index="[PK__T__3214EC27EF7043C5]" Alias="[T]" IndexKind="Clustered" Storage="RowStore" />
                                <SeekPredicates>
                                  <SeekPredicateNew>
                                    <SeekKeys>
                                      <Prefix ScanType="EQ">
                                        <RangeColumns>
                                          <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                                        </RangeColumns>
                                        <RangeExpressions>
                                          <ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[pk1] as [SomeText:1].[pk1]">
                                            <Identifier>
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </RangeExpressions>
                                      </Prefix>
                                    </SeekKeys>
                                  </SeekPredicateNew>
                                </SeekPredicates>
                              </IndexScan>
                            </RelOp>
                          </NestedLoops>
                        </RelOp>
                      </StreamAggregate>
                    </RelOp>
                  </StreamAggregate>
                </RelOp>
              </ComputeScalar>
            </RelOp>
          </QueryPlan>
        </StmtSimple>
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>');