I have a query that takes about 3 hours to run on our server–and it doesn't take advantage of parallel processing. (about 1.15 million records in dbo.Deidentified
, 300 records in dbo.NamesMultiWord
). The server has access to 8 cores.
UPDATE dbo.Deidentified
WITH (TABLOCK)
SET IndexedXml = dbo.ReplaceMultiWord(IndexedXml),
DE461 = dbo.ReplaceMultiWord(DE461),
DE87 = dbo.ReplaceMultiWord(DE87),
DE15 = dbo.ReplaceMultiWord(DE15)
WHERE InProcess = 1;
and ReplaceMultiword
is a procedure defined as:
SELECT @body = REPLACE(@body,Names,Replacement)
FROM dbo.NamesMultiWord
ORDER BY [WordLength] DESC
RETURN @body --NVARCHAR(MAX)
Is the call to ReplaceMultiword
preventing forming a parallel plan? Is there a way to rewrite this to allow parallelism?
ReplaceMultiword
runs in descending order because some of the replacements are short versions of others, and I want the longest match to succeed.
For example, there may be 'George Washington University' and another from 'Washington University'. If the 'Washington University' match were first, then 'George' would be left behind.
Technically I can use CLR, I'm just not familiar with how to do so.
Best Answer
The UDF is preventing parallelism. It also is causing that spool.
You could use CLR and a compiled regex to do your search and replace. It doesn't block parallelism as long as the required attributes are present and will likely be significantly faster than performing 300 TSQL
REPLACE
operations per function call.Example code is below.
This depends on the existence of a CLR UDF as below (the
DataAccessKind.None
should mean the spool disappears as well as that is there for Halloween protection and isn't needed as this doesn't access the target table).