Sql-server – Xml index, slow count of rows

execution-planindexperformancequery-performancesql serverxml

I am not getting index seek in the following scenario.
Instead of inserting <Number>xxx</Number> in xml column like in this post Why is the secondary selective index not used when the where clause filters on `value()`?, insert 100k rows with this xml <SomeText>NiceText</SomeText> and similar amount of rows of this <SomeText>MoreText</SomeText>. Doesnt need to be 100k. Just needs to be many.
Then add index

create selective xml index SIX_T on dbo.T(XMLDoc) for
(
    pathXQUERY = '/SomeText' as xquery 'xs:string' maxlength(8) singleton
);

And secondary index

create xml index SIX_T_pathXQUERY on dbo.T(XMLDoc)
  using xml index SIX_T for (pathXQUERY);

Then do a count

select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1;

Notice that it doesnt use index seek and is "slow". Can take several seconds with millions of rows.
If i insert same values in a standard column and add an index to it and do a

select count(id)
    from dbo.T as T where SomeTextColumn = 'MoreText'

i get results instantly.
All tests done on sql server 18.3.1

The question is, how can i make counting by xml as fast as counting by column?

Thanks

Best Answer

Data differences

With a low amount of records to seek on, the optimizer is able to use the SIX_T_pathXQUERY index:

and filter on moretext with a seek predicate:

An interesting part here is that it executes the key lookup to get the path_1_id values that are not null.

Since that is a filter definition on the nonclustered xml index...

...while not being present in the index itself.

As a result, for the optimizer to consider using the index it knows that it has to complete these steps:

Filter on the XML value with the secondary index on the internal table
Match these returned values with the SIX_T clustered index on the internal table and filter on path_1_id is not null since path_1_id is not included in the secondary index
Match these values with the actual table, dbo.T to return the ID to count on

Tipping points

When the expected rows to be returned are higher, it favours using the selective clustered XML Index, to be able to run a merge join & no key lookup instead:

with a residual predicate:

To filter on the xml column and path_1_id

Comparing the plans

You could (but you shouldn't) use the USE PLAN hint to force the plan with the seek on the XML column and see what would happen if we where to search on these values.

With the execution time =

   CPU time = 218 ms,  elapsed time = 215 ms.

And the execution time for the scan + merge join plan:

   CPU time = 62 ms,  elapsed time = 58 ms.

In short, I believe that the scan with residual predicate + merge join choice was the right choice by the optimizer.

No way around it

While I might be wrong, I don't think there is a way to improve the count query with regular XML Indexes. We also cannot change these internal tables or even query them:

SELECT * FROM 
[sys].[xml_sxi_table_1463676262_256000];

It even gives a missing index hint on the merge join query plan, which you cannot create:

CREATE NONCLUSTERED INDEX [<Name of Missing Index, sysname,>]
ON [sys].[xml_sxi_table_1463676262_256000] ([pathXQUERY_1_value])
INCLUDE ([path_1_id])

To improve the queries, you would have to look into non-xml index solutions.

Edit

How do i use USE PLAN hint? You have example code for this query?

I would advise against this, but for testing purposes it would be fine.

Step 1: You would have to get the actual execution plan of the query and get the xml:

If you don't have the query with the low estimates, enter a value that does not exist like:

select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "bbb"]') = 1

Step 2: Replace all 's with ''s in the execution plan xml, we will need this later.

Step 3: Paste the plan between OPTION( USE PLAN '')

SELECT count(*)
FROM dbo.T as T
WHERE T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN 
'');

Step 4: I had to change utf-16 to utf-8 to get the use plan hint to work

From:

OPTION(USE PLAN
'<?xml version="1.0" encoding="utf-16"?>

To:

OPTION(USE PLAN
'<?xml version="1.0" encoding="utf-8"?>

Step 5: Run the query.

My query now looks like this:

select count(*)
from dbo.T as T
where T.XMLDoc.exist('/SomeText[. eq "MoreText"]') = 1
OPTION(USE PLAN
'<?xml version="1.0" encoding="utf-8"?>
<ShowPlanXML xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" Version="1.481" Build="14.0.3223.3" xmlns="http://schemas.microsoft.com/sqlserver/2004/07/showplan">
  <BatchSequence>
    <Batch>
      <Statements>
        <StmtSimple StatementCompId="1" StatementEstRows="1" StatementId="1" StatementOptmLevel="FULL" StatementOptmEarlyAbortReason="GoodEnoughPlanFound" CardinalityEstimationModelVersion="70" StatementSubTreeCost="0.00986014" StatementText="select count(*)&#xD;&#xA;from dbo.T as T&#xD;&#xA;where T.XMLDoc.exist(''/SomeText[. eq &quot;bbb&quot;]'') = 1" StatementType="SELECT" QueryHash="0x412154B6AD55BBFC" QueryPlanHash="0x280B174BF20902E3" RetrievedFromCache="true" SecurityPolicyApplied="false">
          <StatementSetOptions ANSI_NULLS="true" ANSI_PADDING="true" ANSI_WARNINGS="true" ARITHABORT="true" CONCAT_NULL_YIELDS_NULL="true" NUMERIC_ROUNDABORT="false" QUOTED_IDENTIFIER="true" />
          <QueryPlan DegreeOfParallelism="1" CachedPlanSize="40" CompileTime="394" CompileCPU="301" CompileMemory="648">
            <MemoryGrantInfo SerialRequiredMemory="0" SerialDesiredMemory="0" />
            <OptimizerHardwareDependentProperties EstimatedAvailableMemoryGrant="419404" EstimatedPagesCached="52425" EstimatedAvailableDegreeOfParallelism="2" MaxCompileMemory="3655432" />
            <QueryTimeStats CpuTime="0" ElapsedTime="0" />
            <RelOp AvgRowSize="11" EstimateCPU="0" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Compute Scalar" NodeId="0" Parallel="false" PhysicalOp="Compute Scalar" EstimatedTotalSubtreeCost="0.00986014">
              <OutputList>
                <ColumnReference Column="Expr1017" />
              </OutputList>
              <ComputeScalar>
                <DefinedValues>
                  <DefinedValue>
                    <ColumnReference Column="Expr1017" />
                    <ScalarOperator ScalarString="CONVERT_IMPLICIT(int,[Expr1020],0)">
                      <Convert DataType="int" Style="0" Implicit="true">
                        <ScalarOperator>
                          <Identifier>
                            <ColumnReference Column="Expr1020" />
                          </Identifier>
                        </ScalarOperator>
                      </Convert>
                    </ScalarOperator>
                  </DefinedValue>
                </DefinedValues>
                <RelOp AvgRowSize="11" EstimateCPU="1.1E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Aggregate" NodeId="1" Parallel="false" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="0.00986014">
                  <OutputList>
                    <ColumnReference Column="Expr1020" />
                  </OutputList>
                  <RunTimeInformation>
                    <RunTimeCountersPerThread Thread="0" ActualRows="1" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
                  </RunTimeInformation>
                  <StreamAggregate>
                    <DefinedValues>
                      <DefinedValue>
                        <ColumnReference Column="Expr1020" />
                        <ScalarOperator ScalarString="Count(*)">
                          <Aggregate AggType="countstar" Distinct="false" />
                        </ScalarOperator>
                      </DefinedValue>
                    </DefinedValues>
                    <RelOp AvgRowSize="9" EstimateCPU="1E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Aggregate" NodeId="3" Parallel="false" PhysicalOp="Stream Aggregate" EstimatedTotalSubtreeCost="0.00985904">
                      <OutputList />
                      <RunTimeInformation>
                        <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
                      </RunTimeInformation>
                      <StreamAggregate>
                        <DefinedValues />
                        <GroupBy>
                          <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                        </GroupBy>
                        <RelOp AvgRowSize="11" EstimateCPU="4.18E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Inner Join" NodeId="4" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.00985804">
                          <OutputList>
                            <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                          </OutputList>
                          <RunTimeInformation>
                            <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
                          </RunTimeInformation>
                          <NestedLoops Optimized="false">
                            <OuterReferences>
                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                            </OuterReferences>
                            <RelOp AvgRowSize="16" EstimateCPU="4.18E-06" EstimateIO="0" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Inner Join" NodeId="5" Parallel="false" PhysicalOp="Nested Loops" EstimatedTotalSubtreeCost="0.00657038">
                              <OutputList>
                                <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                              </OutputList>
                              <RunTimeInformation>
                                <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" />
                              </RunTimeInformation>
                              <NestedLoops Optimized="false">
                                <OuterReferences>
                                  <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                  <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                </OuterReferences>
                                <RelOp AvgRowSize="15" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" EstimatedRowsRead="1" LogicalOp="Index Seek" NodeId="6" Parallel="false" PhysicalOp="Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="150002">
                                  <OutputList>
                                    <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                    <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                  </OutputList>
                                  <RunTimeInformation>
                                    <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="1" ActualExecutions="1" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="1" ActualLogicalReads="3" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
                                  </RunTimeInformation>
                                  <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                                    <DefinedValues>
                                      <DefinedValue>
                                        <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                      </DefinedValue>
                                      <DefinedValue>
                                        <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                      </DefinedValue>
                                    </DefinedValues>
                                    <Object Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Index="[SIX_T_pathXQUERY]" Filtered="true" Alias="[SomeText:1]" IndexKind="SecondarySelectiveXML" Storage="RowStore" />
                                    <SeekPredicates>
                                      <SeekPredicateNew>
                                        <SeekKeys>
                                          <Prefix ScanType="EQ">
                                            <RangeColumns>
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pathXQUERY_1_value" />
                                            </RangeColumns>
                                            <RangeExpressions>
                                              <ScalarOperator ScalarString="N''bbb''">
                                                <Const ConstValue="N''bbb''" />
                                              </ScalarOperator>
                                            </RangeExpressions>
                                          </Prefix>
                                        </SeekKeys>
                                      </SeekPredicateNew>
                                    </SeekPredicates>
                                  </IndexScan>
                                </RelOp>
                                <RelOp AvgRowSize="461" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" LogicalOp="Clustered Index Seek" NodeId="8" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="150002">
                                  <OutputList />
                                  <RunTimeInformation>
                                    <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="0" ActualExecutions="0" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="0" ActualLogicalReads="0" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
                                  </RunTimeInformation>
                                  <IndexScan Lookup="true" Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                                    <DefinedValues />
                                    <Object Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Index="[SIX_T]" Alias="[SomeText:1]" TableReferenceId="-1" IndexKind="SelectiveXML" Storage="RowStore" />
                                    <SeekPredicates>
                                      <SeekPredicateNew>
                                        <SeekKeys>
                                          <Prefix ScanType="EQ">
                                            <RangeColumns>
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                            </RangeColumns>
                                            <RangeExpressions>
                                              <ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[pk1] as [SomeText:1].[pk1]">
                                                <Identifier>
                                                  <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                                </Identifier>
                                              </ScalarOperator>
                                              <ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[row_id] as [SomeText:1].[row_id]">
                                                <Identifier>
                                                  <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="row_id" />
                                                </Identifier>
                                              </ScalarOperator>
                                            </RangeExpressions>
                                          </Prefix>
                                        </SeekKeys>
                                      </SeekPredicateNew>
                                    </SeekPredicates>
                                    <Predicate>
                                      <ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[path_1_id] as [SomeText:1].[path_1_id] IS NOT NULL">
                                        <Logical Operation="IS NOT NULL">
                                          <ScalarOperator>
                                            <Identifier>
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="path_1_id" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </Logical>
                                      </ScalarOperator>
                                    </Predicate>
                                  </IndexScan>
                                </RelOp>
                              </NestedLoops>
                            </RelOp>
                            <RelOp AvgRowSize="11" EstimateCPU="0.0001581" EstimateIO="0.003125" EstimateRebinds="0" EstimateRewinds="0" EstimatedExecutionMode="Row" EstimateRows="1" EstimatedRowsRead="1" LogicalOp="Clustered Index Seek" NodeId="9" Parallel="false" PhysicalOp="Clustered Index Seek" EstimatedTotalSubtreeCost="0.0032831" TableCardinality="200002">
                              <OutputList>
                                <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                              </OutputList>
                              <RunTimeInformation>
                                <RunTimeCountersPerThread Thread="0" ActualRows="0" Batches="0" ActualEndOfScans="0" ActualExecutions="0" ActualExecutionMode="Row" ActualElapsedms="0" ActualCPUms="0" ActualScans="0" ActualLogicalReads="0" ActualPhysicalReads="0" ActualReadAheads="0" ActualLobLogicalReads="0" ActualLobPhysicalReads="0" ActualLobReadAheads="0" />
                              </RunTimeInformation>
                              <IndexScan Ordered="true" ScanDirection="FORWARD" ForcedIndex="false" ForceSeek="false" ForceScan="false" NoExpandHint="false" Storage="RowStore">
                                <DefinedValues>
                                  <DefinedValue>
                                    <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                                  </DefinedValue>
                                </DefinedValues>
                                <Object Database="[adventureworks]" Schema="[dbo]" Table="[T]" Index="[PK__T__3214EC27EF7043C5]" Alias="[T]" IndexKind="Clustered" Storage="RowStore" />
                                <SeekPredicates>
                                  <SeekPredicateNew>
                                    <SeekKeys>
                                      <Prefix ScanType="EQ">
                                        <RangeColumns>
                                          <ColumnReference Database="[adventureworks]" Schema="[dbo]" Table="[T]" Alias="[T]" Column="ID" />
                                        </RangeColumns>
                                        <RangeExpressions>
                                          <ScalarOperator ScalarString="[adventureworks].[sys].[xml_sxi_table_1463676262_256000].[pk1] as [SomeText:1].[pk1]">
                                            <Identifier>
                                              <ColumnReference Database="[adventureworks]" Schema="[sys]" Table="[xml_sxi_table_1463676262_256000]" Alias="[SomeText:1]" Column="pk1" />
                                            </Identifier>
                                          </ScalarOperator>
                                        </RangeExpressions>
                                      </Prefix>
                                    </SeekKeys>
                                  </SeekPredicateNew>
                                </SeekPredicates>
                              </IndexScan>
                            </RelOp>
                          </NestedLoops>
                        </RelOp>
                      </StreamAggregate>
                    </RelOp>
                  </StreamAggregate>
                </RelOp>
              </ComputeScalar>
            </RelOp>
          </QueryPlan>
        </StmtSimple>
      </Statements>
    </Batch>
  </BatchSequence>
</ShowPlanXML>');

Related Solutions

Sql-server – How to improve performance if index is not Most SELECTIVE

Without having much detail, I can't recommend much.

One thing that does jump out at me is that it's very likely you can improve performance on the table by normalizing it! The presence of so many duplicated (so few unique) values in the columns you listed suggests that perhaps many others in the table are not normalized, as well. I'm suggesting making the Name column an int (or even smallint) with a foreign key to a Names table, and the Current_state column bit (or alternately a tinyint) with a foreign key to a WhateverStates table. You would have to, of course, change your data access code to deal with this indirection, but that is nothing more than the basic job relational database developers have always had to do.

Normalizing will reduce the number of bytes per row, increasing the number of rows per page, reducing the number of pages that have to be read to satisfy any particular query, helping performance across the board! Right now the columns given require likely close to 34 bytes each. After the change I suggest, those columns will only require 11 bytes each. Of course, I haven't seen your whole table--your rows may be so big that it doesn't matter.

What columns and datatypes are in the clustered index (if there is one)? This can radically affect the size of the nonclustered indexes, again affecting performance in exactly the way I described (rows per page).

When you do query based on non-selective columns such as Current_state, what other columns are always or almost always included? It may be okay for you to have a nonselective column in an index if the index also contains a more selective column (or that in conjunction with the less selective column is more highly selective). If on the other hand you generally query often for rows based on the single column Current_state = 'Pending', then you can add a filtered index:

CREATE INDEX IX_YourTable_Pending ON dbo.YourTable (ClusteredColumnsInOrder)
WHERE Current_State = 'Pending'; --SQL 2008 and up only

This technique could help you even when you also include other columns: you would want to put those in the index instead of the ClusteredColumnsInOrder columns I suggested (which was just a tricky way to not put any additional columns into the nonclustered index, since--remember, now--nonclustered indexes always have all the columns of the clustered index implicitly included). Or, if you only pull a very few other columns, you can make your nonclustered index cover the query by adding INCLUDE (AdditionalColumn1, AdditionalColumn2) so that the query engine doesn't have to go back to the clustered index to satisfy the query.

You haven't provided very much information such as full table schema, sample data, and sample queries, and without those it's going to be pretty hard to give you very specific advice about what to do.

One thing I can say, though, is that indiscriminately throwing indexes at the table may not improve things much and could in fact hurt performance of your system overall.

If the hints I have given you here don't seem to help much, then I recommend that you do come back with some of the additional info I mentioned so that we can do a better job of assisting you.

SQL Server – Secondary Selective Index Not Used with WHERE Clause on value()

The declaration of singleton in the path expression of the index enforces that you can not add multiple <Number> elements but the XQuery compiler does not take that into consideration when interpreting the expression in the value() function. You have to specify [1] to make SQL Server happy. Using typed XML with a schema does not help with that either. And because of that SQL Server builds a query that uses something that could be called an "apply" pattern.

Easiest to demonstrate is to use regular tables instead of XML simulating the query we are actually executing against T and the internal table.

Here is the setup for the internal table as a real table.

create table dbo.xml_sxi_table
(
  pk1 int not null,
  row_id int,
  path_1_id varbinary(900),
  pathSQL_1_sql_value int,
  pathXQUERY_2_value float
);

go

create clustered index SIX_T on xml_sxi_table(pk1, row_id);
create nonclustered index SIX_pathSQL on xml_sxi_table(pathSQL_1_sql_value) where path_1_id is not null;
create nonclustered index SIX_T_pathXQUERY on xml_sxi_table(pathXQUERY_2_value) where path_1_id is not null;

go

insert into dbo.xml_sxi_table(pk1, row_id, path_1_id, pathSQL_1_sql_value, pathXQUERY_2_value)
select T.ID, 1, T.ID, T.ID, T.ID
from dbo.T;

With both tables in place you can execute the equivalent of the exist() query.

select count(*)
from dbo.T
where exists (
             select *
             from dbo.xml_sxi_table as S
             where S.pk1 = T.ID and
                   S.pathXQUERY_2_value = 314 and
                   S.path_1_id is not null
             );

The equivalent of the value() query would look like this.

select count(*)
from dbo.T
where (
      select top(1) S.pathSQL_1_sql_value
      from dbo.xml_sxi_table as S
      where S.pk1 = T.ID and
            S.path_1_id is not null
      order by S.path_1_id
      ) = 314;

The top(1) and order by S.path_1_id is the culprit and it is [1] in the Xpath expression that is to blame.

I don't think it is possible for Microsoft to fix this with the current structure of the internal table even if you were allowed to leave out the [1] from the values() function. They would probably have to create multiple internal tables for each path expression with unique constraints in place to guarantee for the optimizer that there can only be one <number> element for each row. Not sure that would actually be enough for the optimizer to "break out of the apply pattern".

For you who think this fun and interesting and since you are still reading this you probably are.

Some queries to look at the structure of the internal table.

select T.name, 
       T.internal_type_desc, 
       object_name(T.parent_id) as parent_table_name
from sys.internal_tables as T
where T.parent_id = object_id('T');

select C.name as column_name, 
       C.column_id,
       T.name as type_name,
       C.max_length,
       C.is_sparse,
       C.is_nullable
from sys.columns as C
  inner join sys.types as T
    on C.user_type_id = T.user_type_id
where C.object_id in (
                     select T.object_id 
                     from sys.internal_tables as T 
                     where T.parent_id = object_id('T')
                     )
order by C.column_id;

select I.name as index_name,
       I.type_desc,
       I.is_unique,
       I.filter_definition,
       IC.key_ordinal,
       C.name as column_name, 
       C.column_id,
       T.name as type_name,
       C.max_length,
       I.is_unique,
       I.is_unique_constraint
from sys.indexes as I
  inner join sys.index_columns as IC
    on I.object_id = IC.object_id and
       I.index_id = IC.index_id
  inner join sys.columns as C
    on IC.column_id = C.column_id and
       IC.object_id = C.object_id
  inner join sys.types as T
    on C.user_type_id = T.user_type_id
where I.object_id in (
                     select T.object_id 
                     from sys.internal_tables as T 
                     where T.parent_id = object_id('T')
                     );

Best Answer

Related Solutions

Sql-server – How to improve performance if index is not Most SELECTIVE

SQL Server – Secondary Selective Index Not Used with WHERE Clause on value()

Related Question