Sql-server – T-SQL LIKE Predicate failed to match with whitespace in XML converted varchar

likesql serverstring-searchingt-sqlxml

Recently I attempt to search for a particular pattern by converting XML data into varchar(max) although I'm aware it's not the best practice and found out it's not working as expected:-

Setup

declare @container table(
    [Response] xml not null
);

declare @xml xml =
'<soapenv:Envelope xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/" xmlns:xsd="http://abc.com/xsd" xmlns:ns="http://abc.com" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
   <soapenv:Header>
     <ns:MessageHeader>
       <xsd:ID>ABC</xsd:ID>
       <xsd:Date>2018-12-31T23:59:59</xsd:Date>
     </ns:MessageHeader>
   </soapenv:Header>
   <soapenv:Body>
     <ns:MessageResponse>
       <ns:return>
         <xsd:ResponseList xsi:nil="true" />
       </ns:return>
     </ns:MessageResponse>
   </soapenv:Body>
 </soapenv:Envelope>';

insert into @container values (@xml);

This query works

select *
  from @container
 where cast(Response as varchar(max))
  like '%<xsd:ResponseList xsi:nil="true"%';

notice the wildcard character ends 3 characters (i.e.' />') before the XML node

but this is not

select *
  from @container
 where cast(Response as varchar(max))
  like '%<xsd:ResponseList xsi:nil="true" %' -- with space
    or cast(Response as varchar(max))
  like '%<xsd:ResponseList xsi:nil="true" />%' -- whole XML node;

I suspect this is probably due to escape characters and tried a few other alternatives but to no avail, appreciate if someone can shed some light on this.

EDIT (ANSWERED)

Following query would work based on Mr. Browstone's insight:-

select *
  from @container
 where cast(Response as varchar(max))
  like '%<xsd:ResponseList xsi:nil="true"/>%';

Here's my follow question @ CodeReview with XQuery expression:-

T-SQL Verify whether XML node from SOAP request contains any child nodes

Best Answer

This is by design.

When you store a document using the XML data type it is compressed and organised into a structure that Sql Server can perform operations on efficiently. One of the steps that it goes through to do this is to generate the InfoSet. When it does this, it removes anything that it determines to not be necessary, in your example, whitespace:

The InfoSet content may not be an identical copy of the text XML, because the following information is not retained: insignificant white spaces, order of attributes, namespace prefixes, and XML declaration.

When you select the entire contents of the field (such as when you are converting it to NVARCHAR(MAX) it rebuilds the XML document before returning it. This document may not be an identical copy of the document that you inserted. For example, if you have used self-closing elements, Sql Server may return opening and closing elements instead.

The documentation also continues on to say:

Example: Retaining Exact Copies of XML Data

For illustration, assume that government regulations require you to retain exact textual copies of your XML documents. For example, these could include signed documents, legal documents, or stock transaction orders. You may want to store your documents in a [n]varchar(max) column.

So, if you want to store the exact copy of your document, then NVARCHAR(MAX) or VARCHAR(MAX) is the best option. You can then convert it to XML to query it later on (though this can be costly).

For more information, see the documentation on XML Data Type and Columns (SQL Server) and also Define the Serialization of XML Data which outlines the rules that Sql Server applies when converting XML to a string type.