Sql-server – Performance of XML to Table Function in Stored Procedures

functionssql serversql server 2014xml

In a number of the databases I'm helping to maintain, there is a pattern where the code passes in a list of IDs as an XML string to a stored procedure. A user-defined function turns them into a table which is then used to match IDs. The function that accomplishes this looks like this:

ALTER function [dbo].[XMLIdentifiers] (@xml xml)
returns table
as
return (
    --get the ids from the xml
    select Item.value('.', 'int') as id from @xml.nodes('//id') as T(Item)
)

This works fine when we test it in a wide variety of scenarios in SSMS. But if it's called within a stored procedure, it will run extremely slowly and the execution plan will show it spending the large majority of time parsing these IDs. This is true whether we write the results to a temp table, join to them or use them in a subquery.

Example from an execution plan using the above function:

execution plan

Can anyone offer insight as to why we are seeing such poor performance from these queries? Is there a better way to parse the values from XML? Most of the databases using this pattern are on SQL Server 2014 or 2016.

Best Answer

To answer part of your question, yes there is a better way to parse the values from the XML.

Always( * ) extract the text() from the xml node at the earliest opportunity.

( * - "it depends" but always test to make sure)

In your case, this means changing the "nodes" method to use the text() node as part of the xpath query:

ALTER function [dbo].[XMLIdentifiers] (@xml xml)
returns table
as
return (
    --get the ids from the xml
    select Item.value('.', 'int') as id from @xml.nodes('//id/text()') as T(Item)
)

In this simple test, you will see not only does a simple statistics test show a dramatic improvement, but also the execution plan changes significantly.

declare @x xml
select @x = (select top(100000) row_number() over(order by @@spid) as [n] from sys.columns as [a],sys.columns as [b] for xml auto,elements,type);

declare @c int;

set statistics io,time on;

-- Usual suspect 
select @c = nd.value('.','int')
from @x.nodes('//n') x(nd);

-- Using text() when we might also need other parts of the node
select @c = nd.value('(./text())[1]','int')
from @x.nodes('//n') x(nd);

-- Using text() when that is all we need from the node
select @c = nd.value('.','int')
from @x.nodes('//n/text()') x(nd);

set statistics io,time off;

RESULTS

Not using text(): CPU time = 1281 ms, elapsed time = 1459 ms.

Using text() late: CPU time = 719 ms, elapsed time = 739 ms.

Using text() early: CPU time = 406 ms, elapsed time = 473 ms.

Related Solutions

SQL Server – Passing Array Parameters to Stored Procedure

The best ever articles on this matter are by Erland Sommarskog:

He covers all options and explains pretty well.

Sorry for the shortness of the answer, but Erland's article on Arrays is like Joe Celko's books on trees and other SQL treats :)

Sql-server – deteriorating stored procedure running times

What is up with FROM part JOIN model ON 1=1? This the same as FROM part, model, which is a cartesian join and will result in a very large number of rows. Is that join supposed to be like that?

You will likely help us help you if you provide details about the tables involved. Please "script" the definition of the tables, along with any indexes defined on those tables.

This sounds like a classic case of parameter sniffing resulting in good plan/bad plan choices for various scenarios in your data.

You may be able to get more reliable performance by making SQL Server cache different plans for different scenarios by using sp_executesql, as in the following example:

CREATE PROCEDURE [dbo].[create_grid_materials2] 
(
    @partlistid bigint
    , @pid bigint
    , @masterid bigint
)
AS
BEGIN
    begin
        DECLARE @cmd NVARCHAR(MAX);

        SET @cmd = '   
        INSERT INTO material (partid, personid, modelID)
        SELECT 
            partid = part.id
            , personid = @pid
            , modelid = model.id  
        FROM part
            INNER JOIN model ON 1=1
        WHERE (
            model.masterid = ' + CONVERT(NVARCHAR(50), @masterid) + ' 
                AND model.modelSetID IS NULL
                AND part.partlistid = ' + CONVERT(NVARCHAR(50), @partlistid) + '
                AND (
                    part.partType = 100 
                    or part.partType=120 
                    or part.partType = 130
                )
            )
            AND NOT EXISTS (
                SELECT 1 
                FROM material AS a1 
                WHERE a1.partid = part.id 
                    AND a1.personid=@pid 
                    AND a1.modelid=model.id
                )';
        DECLARE @Params VARCHAR(200);
        SET @Params = '@pid INT';
        EXEC sys.sp_executesql @cmd
            , @Params
            , @pid = @pid;
    end
End

The above code will cause a new plan to be generated for each combination of @partlistid, and @masterid.

The presumption here is some combinations of those two variables lead to a very small number of rows, whereas some combinations lead to a very large number of rows.

Forcing a plan for each combination allows SQL Server to generate more efficient plans for each. I've explicitly not included @pid since you probably want to try it with a fairly small number of combinations first; adding a third variable to the mix will make for an exponentially larger number of possible plans.

Best Answer

RESULTS

Related Solutions

SQL Server – Passing Array Parameters to Stored Procedure

Sql-server – deteriorating stored procedure running times

Related Question