SQL Server – Can a Query Be Refactored to Run in Parallel?

parallelismperformancequery-performancesql serversql-server-2016

I have a query that takes about 3 hours to run on our server–and it doesn't take advantage of parallel processing. (about 1.15 million records in dbo.Deidentified, 300 records in dbo.NamesMultiWord). The server has access to 8 cores.

  UPDATE dbo.Deidentified 
     WITH (TABLOCK)
  SET IndexedXml = dbo.ReplaceMultiWord(IndexedXml),
      DE461 = dbo.ReplaceMultiWord(DE461),
      DE87 = dbo.ReplaceMultiWord(DE87),
      DE15 = dbo.ReplaceMultiWord(DE15)
  WHERE InProcess = 1;

and ReplaceMultiword is a procedure defined as:

SELECT @body = REPLACE(@body,Names,Replacement)
 FROM dbo.NamesMultiWord
 ORDER BY [WordLength] DESC
RETURN @body --NVARCHAR(MAX)

Is the call to ReplaceMultiword preventing forming a parallel plan? Is there a way to rewrite this to allow parallelism?

ReplaceMultiword runs in descending order because some of the replacements are short versions of others, and I want the longest match to succeed.

For example, there may be 'George Washington University' and another from 'Washington University'. If the 'Washington University' match were first, then 'George' would be left behind.

query plan

Technically I can use CLR, I'm just not familiar with how to do so.

Best Answer

The UDF is preventing parallelism. It also is causing that spool.

You could use CLR and a compiled regex to do your search and replace. It doesn't block parallelism as long as the required attributes are present and will likely be significantly faster than performing 300 TSQL REPLACE operations per function call.

Example code is below.

DECLARE @X XML = 
(
    SELECT Names AS [@find],
           Replacement  AS [@replace]
    FROM  dbo.NamesMultiWord 
    ORDER BY [WordLength] DESC
    FOR XML PATH('x'), ROOT('spec')
);

UPDATE dbo.Deidentified WITH (TABLOCK)
SET    IndexedXml = dbo.ReplaceMultiWord(IndexedXml, @X),
       DE461 = dbo.ReplaceMultiWord(DE461, @X),
       DE87 = dbo.ReplaceMultiWord(DE87, @X),
       DE15 = dbo.ReplaceMultiWord(DE15, @X)
WHERE  InProcess = 1; 

This depends on the existence of a CLR UDF as below (the DataAccessKind.None should mean the spool disappears as well as that is there for Halloween protection and isn't needed as this doesn't access the target table).

using System;
using System.Data.SqlTypes;
using Microsoft.SqlServer.Server;
using System.Text.RegularExpressions;
using System.Collections.Generic;
using System.Xml;

public partial class UserDefinedFunctions
{
    //TODO: Concurrency?
    private static readonly Dictionary<string, ReplaceSpecification> cachedSpecs = 
                        new Dictionary<string, ReplaceSpecification>();

    [SqlFunction(IsDeterministic = true,
                 IsPrecise = true,
                 DataAccess = DataAccessKind.None,
                 SystemDataAccess = SystemDataAccessKind.None)]
    public static SqlString ReplaceMultiWord(SqlString inputString, SqlXml replacementSpec)
    {
        //TODO: Implement something to drop things from the cache and use a shorter key.
        string s = replacementSpec.Value;
        ReplaceSpecification rs;

        if (!cachedSpecs.TryGetValue(s, out rs))
        {
            var doc = new XmlDocument();
            doc.LoadXml(s);
            rs = new ReplaceSpecification(doc);
            cachedSpecs[s] = rs;
        }

        string result = rs.GetResult(inputString.ToString());
        return new SqlString(result);
    }


    internal class ReplaceSpecification
    {
        internal ReplaceSpecification(XmlDocument doc)
        {
            Replacements = new Dictionary<string, string>();

            XmlElement root = doc.DocumentElement;
            XmlNodeList nodes = root.SelectNodes("x");

            string pattern = null;
            foreach (XmlNode node in nodes)
            {
                if (pattern != null)
                    pattern = pattern + "|";

                string find = node.Attributes["find"].Value.ToLowerInvariant();
                string replace = node.Attributes["replace"].Value;
                 //TODO: Escape any special characters in the regex syntax
                pattern = pattern + find;
                Replacements[find] = replace;
            }

            if (pattern != null)
            {
                pattern = "(?:" + pattern + ")";
                Regex = new Regex(pattern, RegexOptions.IgnoreCase | RegexOptions.Compiled);
            }


        }
        private Regex Regex { get; set; }

        private Dictionary<string, string> Replacements { get; set; }


        internal string GetResult(string inputString)
        {
            if (Regex == null)
                return inputString;

            return Regex.Replace(inputString,
                                 (Match m) =>
                                 {
                                     string s;
                                     if (Replacements.TryGetValue(m.Value.ToLowerInvariant(), out s))
                                     {
                                         return s;
                                     }
                                     else
                                     {
                                         throw new Exception("Missing replacement definition for " + m.Value);
                                     }
                                 });
        }
    }
}