Sql-server – XML Parsing Illegal XML character error in SQL 2008

sql serversql-server-2008

I was running this query in SQL Server 2008 (10.0.5890),
Collation: SQL_Latin1_General_CP1_CI_AS

SELECT
   CAST(event_data.value('(event/data/value)[1]',
   'NVARCHAR(max)') AS XML) AS DeadlockGraph 
FROM
   ( SELECT
      XEvent.query('.') AS event_data 
   FROM
      ( -- Cast the target_data to XML SELECT
         CAST(target_data AS XML) AS TargetData 
      FROM
         sys.dm_xe_session_targets st 
      JOIN
         sys.dm_xe_sessions s 
            ON s.address = st.event_session_address 
      WHERE
         name = 'system_health' 
         AND target_name = 'ring_buffer' ) AS Data -- Split out the Event Nodes CROSS APPLY TargetData.nodes('RingBufferTarget/ event[@name="xml_deadlock_report"]') AS XEventData ( XEvent ) ) AS tab ( event_data )

But it was always ended with the following error:

Msg 9420, Level 16, State 1, Line 1
XML parsing: line 22203, character 102, illegal xml character

Even if I changed it to VARCHAR(max) it remains the same.
Though this same script was running fine and without any issue in another server with the same version & collation.

Sample XML result from the working database:

Proc [Database Id = 5 Object Id = 2033194789]

    </process>
    <process id="process55d288" taskpriority="0" logused="584" waitresource="KEY: 5:72057594332250112 (6b0281c999d7)" waittime="1281" ownerId="1679" transactionname="EventGetQueueTopSp" lasttranstarted="2016-07-02T20:08:56.350" XDES="0x8209f990" lockMode="U" schedulerid="4" kpid="2880" status="suspended" spid="58" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2016-07-02T20:09:12.370" lastbatchcompleted="2016-07-02T20:08:59.090" lastattention="1900-01-01T00:00:00.090" clientapp="EventService Queue Check" hostname="WVERPDB" hostpid="3148" loginname="sa" isolationlevel="read uncommitted (1)" xactid="1679" currentdb="5" lockTimeout="4294967295" clientoption1="671221856" clientoption2="128056">
      <executionStack>
        <frame procname="" line="96" stmtstart="8066" stmtend="8714" sqlhandle="0x0300050009a8007d9eadc300449e00000100000000000000" />
      </executionStack>
      <inputbuf>
</deadlock>

Here is the SQL version for both DB

Microsoft SQL Server 2008 (SP3) - 10.0.5890.0 (X64)   Apr  2 2015 16:07:19   Copyright (c) 1988-2008 Microsoft Corporation  Standard Edition (64-bit) on Windows NT 6.1 <X64> (Build 7601: Service Pack 1) (VM) 

Best Answer

Here is how to find the "illegal" character:

  1. There are two places in your query that cast to XML, so start with confirming which one is throwing the error. I doubt it is the outer SELECT since that is just taking a subset of the data that is already valid XML. And, if it was the outer CAST, that would mean that an individual deadlock graph would have at least 22,203 lines it. Most likely the error line number that is leading you to the outer select is simply referencing the query, not a particular part of the query.

    Remove the CAST( and AS XML) parts of the outer SELECT. If the error still happens, it is coming from the CAST(target_data part, else it really is the outer part. OR, you can just highlight and run the inner-most SELECT (i.e. the one that is the derived table, Data).

  2. Next, we need to see the string that you are trying to CAST into XML. This is tricky as you can't display more than 64k characters in the Grid in SSMS, and using SQLCMD will get 1 million characters, but this data is likely well over even that limit.

    However, we can break it up into smaller pieces. As far as my testing shows, the fact that you have a line # in the XML that is above 1 indicates that there are newlines in the source value. So, we can capture the incoming target_data value from sys.dm_xe_session_targets and print it to the Messages tab of SSMS. Yes, the PRINT statement is limited to 8000 VARCHAR characters or 4000 NVARCHAR characters, but we can split up the original value on the newline characters and call PRINT on each chunk. I have the code for a Stored Procedure that does just this on PasteBin at T-SQL Stored Proc to PRINT NVARCHAR(MAX) values, which is based on my DBA.StackExchange answer How to output more than 4000 characters in sqlcmd.

    Create the DisplayN Stored Procedure and then run the following query:

    DECLARE @ShouldBeXml NVARCHAR(MAX);
    
    SELECT @ShouldBeXml = st.[target_data]
    FROM   sys.dm_xe_session_targets st
    INNER JOIN   sys.dm_xe_sessions s
            ON   s.[address] = st.[event_session_address]
    WHERE  s.[name] = 'system_health'
    AND    st.[target_name] = 'ring_buffer';
    
    EXEC dbo.DisplayN @ShouldBeXml;
    PRINT N'-------------------------------------';
    
    SELECT CONVERT(XML, @ShouldBeXml);
    
  3. Scroll on down to whatever line # is noted in the "XML parsing: line XXXXX" error message that is below the ------------ line in the "Messages" tab.

    If there is no obviously invalid character on that line (and you can try copying and pasting that line into a SELECT CONVERT(XML, N'{that_bad_line}'); statement to test out to confirm), then we can modify the DisplayN Stored Procedure to also output the VARBINARY hex values of each line so we can find hidden / control characters.