Sql-server – SQL Server : parsing with XPath Performance

execution-plansql serversql-server-2008stored-procedurest-sql

I have a UI that allows a user to select certain data points they want to filter the results by. There are things such as location, department etc. Once they have selected all of the options, I generate an XML string that I pass to my stored procedure.

From there, for locations for example I create a temp table and dump all of the locations that were selected by the user into that table which I then join my data on later.

When running the execution plan, I noticed that each one of the 5 or so queries that are parsing the filter data are costing about 12% which is over 60% of the query just to determine the data we are going to be filtering by.

DECLARE @tmLocations TABLE (
    location VARCHAR (100));
BEGIN
    INSERT INTO @tmLocations
    SELECT ParamValues.x1.value('location[1]', 'VARCHAR(200)')
    FROM   @xml.nodes('data/teammateLocations/locations') AS ParamValues(x1);
END

Is there another method of extracting data from XML or ways to improve the query like above so that it doesn't cost as much to run? Setting up the data is costing more than the actual filtering of the data which is really slowing down performance.

Best Answer

When running the execution plan, I noticed that each one of the 5 or so queries that are parsing the filter data are costing about 12% which is over 60% of the query just to determine the data we are going to be filtering by.

The query costs are based on estimates even in the actual execution plans. They do not tell you how efficient the query actually was.

The estimates are in turn based on statistics and those can be outdated giving you estimates and costs that are wrong.

The estimates for XML queries is always wrong. There is no statistics generated for XML columns and there is certainly no statistics generated for XML parameters or variables.

Have a look at this rather simple XML query.

declare @X xml;

select 1
from @X.nodes('*') as T(X);

Estimated query plan

enter image description here

SQL Server assumes there are 10000 elements in the XML and keeps on guessing from there. Using the nodes() function assumes that 200 of those will be returned. Before that there is a Filter operator that checks if @X is not null limiting the number of estimated rows to 66. Pure guesswork and not influenced at all by what data you actually have in your XML.

To know if a query is good enough you should have a look at things like duration, number of reads and allocated memory. Don't use the estimated cost and please don't use the percentages of individual queries to compare performance.

Your XML query could be improved as Mister Magoo suggests in a comment.

SELECT ParamValues.x1.value('(location/text())[1]', 'VARCHAR(200)')
FROM  @xml.nodes('data/teammateLocations/locations') AS ParamValues(x1);

If you don't specify the text() node, SQL server have to generate a plan that works with mixed content XML, concatenating all the node values from the sub nodes.