TSQL 2014 – Importing XML with Accented and Punctuation Characters

sql serversql server 2014xml

Using this solution from Forrest:

DECLARE @XML xml = 
'<?xml version="1.0" encoding="UTF-8"?>
   <Orders>
    <Order>
        <OrderID>334</OrderID>
        <AmountPaid currencyID="EUR">17.10</AmountPaid>
        <UserID>marc58</UserID>
        <ShippingAddress>
            <Name>Marc Juppé</Name>
            <Address>Rue  garçonneé III° arrondissement</Address>
            <City>Paris</City>
            <StateOrProvince></StateOrProvince>
            <Country>FR</Country>
            <Phone>333333333</Phone>
            <PostalCode>22222</PostalCode>
         </ShippingAddress>
         <ShippingCosts>4.50</ShippingCosts>
         <Items>
            <Item>
               <Details>
                    <ItemID>3664</ItemID>
                    <Store>47</Store>
                    <Title>MCPU DDA010</Title>
                    <SKU>mmx</SKU>
                </Details>
                <Quantity>1</Quantity>
                <Price currencyID="EUR">6.2</Price>
            </Item>
            <Item>
               <Details>
                    <ItemID>3665</ItemID>
                    <Store>45</Store>
                    <Title>MCPU DFZ42</Title>
                    <SKU>mmy</SKU>
                </Details>
                <Quantity>2</Quantity>
                <Price currencyID="EUR">3.2</Price>
            </Item>
        </Items>
    </Order>
</Orders>'

SELECT 
    x.value('./ItemID[1]','int') AS ItemID,
    x.value('./Store[1]','int') AS Store,
    x.value('./Title[1]','nvarchar(100)') AS Title,
    x.value('./SKU[1]','nvarchar(100)') AS SKU,
    x.value('../Quantity[1]','int') AS Qty,
    x.value('../Price[1]','decimal(11,2)') AS Price,
    x.query('//OrderID[1]').value('.','int') AS OrderID,
    x.query('//AmountPaid[1]').value('.','decimal(11,2)') AS AmountPaid,
    x.query('//UserID[1]').value('.','nvarchar(100)') AS UserID,
    x.query('//ShippingCosts[1]').value('.','decimal(11,2)') AS ShippingCosts
FROM @XML.nodes('//Item/Details') i(x)

I receive an error like

XML parsing: line 162, character 34, illegal xml character

when need to parse fields containing accented characters (àèòìùé) or °

I tried converting é int &egrave; but get this error:

XML parsing: line 162, character 41, well formed check: undeclared entity

while if I convert to &#232; it works.

The problem is I cannot HTMLEncode the XML File since I expect it will convert also all the "<" and ">" of the XML Structure, and probably also something else.

Can suggest the best solution to HTMLEncode the XML before parsing?

Best Answer

The XML you have is invalid in UTF-8 encoding. The accented characters needs to be encoded. For instance ° should be encoded as °.

Here is a shorter version that also fails.

declare @X xml = '<?xml version="1.0" encoding="UTF-8"?>°';
select @X.value('text()[1]', 'nchar(1)');

Msg 9420, Level 16, State 1, Line 1 XML parsing: line 1, character 39, illegal xml character

Should be like this instead if you use UTF-8.

declare @X xml = '<?xml version="1.0" encoding="UTF-8"?>°';
select @X.value('text()[1]', 'nchar(1)');

You need to trace the source of the data backwards to the producer to see where you end up with invalid XML.