SQL Server – Selecting from a Table with XML Column

importsql serverxml

I have received an Excel file (verion 2010) and based on the requirement I was asked to save it as an XML file and then import into SQL data ready to be queried in normal SELECT statements.

Below is the list of steps I followed but now that I have a SELECT on the data, no records is returned. I'm not sure whether it's due to the conversion that I did to get an XML file from an Excel file, or there is a problem in my OPENXML query.

  1. To obtain a XML file from the .xlsx file, I opend the file in
    EXCEL 2010 and saved it in 'XML spread sheet 2003 (*.XML)'.
  2. I imported the XML file in a table using the code below:

    INSERT INTO XMLwithOpenXML(XMLData, LoadedDateTime)
    SELECT CONVERT(XML, BulkColumn) AS BulkColumn, GETDATE() 
    FROM OPENROWSET(BULK 'C:\DEV\TestXML.xml', SINGLE_BLOB) AS x;
    
    
    SELECT * FROM XMLwithOpenXML
    

enter image description here
1. The XML data has been stored now in the table, so I use the code below to read it in a SELECT statement to access the individual columns:

    DECLARE @XML AS XML, @hDoc AS INT, @SQL NVARCHAR (MAX)


 SELECT @XML = XMLData FROM XMLwithOpenXML

    EXEC sp_xml_preparedocument @hDoc OUTPUT, @XML


    SELECT SNo,Salutation,PatientNRIC,FirstName,LastName,Gender,Race
    FROM OPENXML(@hDoc, 'ROOT/SNo/Salutation/PatientNRIC/FirstName/LastName/Gender/Race')
    WITH 
    (
        SNo [nvarchar](50) '@SNo',
        Salutation [nvarchar](100) '@Salutation',
        PatientNRIC [nvarchar](255) '@PatientNRIC',
        FirstName [nvarchar](255) '@FirstName',
        LastName [nvarchar](255) '@LastName',
        Gender [nvarchar](255) '@Gender',
        Race [nvarchar](255) '@Race'
    )   

Then

EXEC sp_xml_removedocument @hDoc
    GO

However what I get is only the name of columns, the actual data has not been returned:

enter image description here

I am totally new to OPENXML comment and working with XML files, tried reading different posts but still couldn't figure out why no data is shown in my select query. Any help is appreciated.

Best Answer

As your XML is held in a table you can use the methods of the XML data-type ( eg .nodes, .value, .query etc ). The thing you might struggle with a bit is namespaces in XML. Start here to learn more about them: Understanding XML Namespaces.

Here are a few example queries I did against a spreadsheet similar to yours. Try them and work through the Stairway as suggested by @MikaelEriksson and see if they start to make sense.

IF OBJECT_ID('tempdb.#tmp') IS NOT NULL
    DROP TABLE #tmp
GO


-- 1) Pull the data "as is" unpivoted, tag it with row/cell numbers
-- and optionally pivot it
;WITH XMLNAMESPACES ( 
    DEFAULT 'urn:schemas-microsoft-com:office:spreadsheet',
    'urn:schemas-microsoft-com:office:spreadsheet' AS ss   
    ), cte AS (
SELECT
    t.rawXMLId,
    ROW_NUMBER() OVER ( ORDER BY ( SELECT NULL ) ) rowNumber,
    r.c.query('.') rowXML
FROM XMLwithOpenXML t
    CROSS APPLY XMLData.nodes('Workbook/Worksheet[@ss:Name = "Sheet1"]') w(c)
        CROSS APPLY w.c.nodes('Table/Row') r(c)
)
SELECT
    c.rawXMLId,
    c.rowNumber,
    ROW_NUMBER() OVER ( PARTITION BY rowNUmber ORDER BY rowNumber ) cellNumber,
    d.c.value('.', 'VARCHAR(50)') AS cellData
INTO #tmp
FROM cte c
    CROSS APPLY c.rowXML.nodes('Row/Cell/Data') d(c)
GO


SELECT 'unpivoted' s, *
FROM #tmp

SELECT 
    'pivoted' s,
    rawXMLId, rowNumber, 
    [1] AS Sno,
    [2] AS Salutation,
    [3] AS PatientNRIC,
    [4] AS FirstName,
    [5] AS LastName,
    [6] AS Gender,
    [7] AS Race
FROM #tmp
PIVOT ( MAX(cellData) For cellNumber In ( [1], [2], [3], [4], [5], [6], [7] )  ) upvt
GO



-- 2) Use your knowledge about the data to specify the columns manually
;WITH XMLNAMESPACES ( 
    DEFAULT 'urn:schemas-microsoft-com:office:spreadsheet',
    'urn:schemas-microsoft-com:office:spreadsheet' AS ss   
    )
SELECT 
    t.rawXMLId,
    d.c.value('(Cell/Data/text())[1]', 'VARCHAR(50)') AS Sno,
    d.c.value('(Cell/Data/text())[2]', 'VARCHAR(50)') AS Salutation,
    d.c.value('(Cell/Data/text())[3]', 'VARCHAR(50)') AS PatientNRIC,
    d.c.value('(Cell/Data/text())[4]', 'VARCHAR(50)') AS FirstName,
    d.c.value('(Cell/Data/text())[5]', 'VARCHAR(50)') AS LastName,
    d.c.value('(Cell/Data/text())[6]', 'VARCHAR(50)') AS Gender,
    d.c.value('(Cell/Data/text())[7]', 'VARCHAR(50)') AS Race

FROM XMLwithOpenXML t
    CROSS APPLY XMLData.nodes('Workbook/Worksheet[@ss:Name = "Sheet1"]') w(c)
        --CROSS APPLY w.c.nodes('Table/Row') d(c)                   <-- include row header
        CROSS APPLY w.c.nodes('Table/Row[position() > 1]') d(c)     -- exclude row header

GO

My results:

Test results

Just a note about OPENXML, it can be good for larger pieces of XML but I tend to try and avoid it if possible due it's well known memory issues. Your particular example is not working because of the namespaces and you have specified the wrong path (Root etc). Post back if you really want to work with the OPENXML, but I would advise against it in this example as your data is held in a table - OPENXML can only work with once piece of XML at a time.