Sql-server – Performance issue with xpath in SQL Server 2008

performancesql-server-2008

I have a table with lots of large xml-documents.

When I run xpath expressions to select data from those documents I run into a peculiar performance issue.

My query is

SELECT
p.n.value('.', 'int') AS PurchaseOrderID
,x.ProductID 
FROM XmlLoadData x
CROSS APPLY x.PayLoad.nodes('declare namespace NS="http://schemas.datacontract.org/2004/07/XmlDbPerfTest"; 
/NS:ProductAndRelated[1]/NS:Product[1]/NS:PurchaseOrderDetails[1]/NS:PurchaseOrderDetail/NS:PurchaseOrderID[1]') p(n)

The query takes 2 minutes and 8 seconds.

When I remove the [1] parts of the single occurance nodes like this:

SELECT
p.n.value('.', 'int') AS PurchaseOrderID
,x.ProductID 
FROM XmlLoadData x
CROSS APPLY x.PayLoad.nodes('declare namespace NS="http://schemas.datacontract.org/2004/07/XmlDbPerfTest"; 
/NS:ProductAndRelated/NS:Product/NS:PurchaseOrderDetails/NS:PurchaseOrderDetail/NS:PurchaseOrderID') p(n)

The execution time drops to just 18 seconds.
Since the [1]-nodes occurs just once in each parent node in the documents the results are the same except for ordering.

Actual execution plan for the first (slow) query is
Plan for query 1

and the second (faster) query is

enter image description here

Query 1 full screen Query 2 full screen.

As far as I can see the query with [1] does the same execution as the query without, but with the addition of some extra calculation steps to find the first item.

My question is why the second query is faster.
I would have expected the execution of the query with [1] to break early when a match was found and thus reduce the execution time instead of the opposite.
Are there any reasons why the execution does not break early with [1] and thus reduce the execution time.

This is my table

CREATE TABLE [dbo].[XmlLoadData](
    [ProductID] [int] NOT NULL,
    [PayLoad] [xml] NOT NULL,
    [Size]  AS (len(CONVERT([nvarchar](max),[PayLoad],0))),
 CONSTRAINT [PK_XmlLoadData] PRIMARY KEY CLUSTERED 
(
    [ProductID] ASC
)WITH (PAD_INDEX  = OFF, STATISTICS_NORECOMPUTE  = OFF, 
       IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS  = ON, 
       ALLOW_PAGE_LOCKS  = ON) ON [PRIMARY]
) ON [PRIMARY]

Edit:
Performance numbers from SQL Profiler:

Query 1:

CPU     Reads   Writes  Duration 
126251  1224892 0       129797

Query 2:

CPU     Reads   Writes  Duration 
50124   612499  0       16307

Best Answer

The second query uses parallelism. That is, it was expensive enough for the optimizer to shut its eyes to the additional overhead.

I'd guess the second query tells optimizer to "dump everything", which is performed with a paralleled scan. SQL Server likes to "dump everything" in this way when asked.
Whereas the first query asks for "analyze and then give some." The optimizer has no way of knowing there's only one node anyway, so the execution plan it ends up picking is very different.

I'd say it's similar to situation when one table scan is cheaper than many index seeks.