I previously asked a question about an error I was recieving. You don't really need it to understand this question but it's here for reference:
The previous xml is a little complex and probably would benefit from a transformation so I applied an XSLT template to get the below structure and changed the tags a little so its more understandable. I've also restructured the table I'm importing to, for maintainability. I imported the transformed XML file to a SQL Server table, xTable
, with column xData
, Like so (only one row but I suppose you could import more than 1 and merge them all with David Browne's answer):
ID xData
1 <MyXMLFile><Sample><Location>....
The parent node of the xml, <Sample>
, can be repeated up to 1 million times but for this illustration, I only have 2. There are 22 child nodes for each sample, one <SampleID>
node and 21 <Location>
nodes (I've only shown 2 nodes to keep things short). There are 3 child nodes for each node, one <LocationName>
node and two <Foo>
nodes, designated <Foo1>
and <Foo2>
.
<?xml version="1.0" encoding="UTF-16"?>
<MyXMLFile>
<!--There CAN BE up to 1 million <Sample> nodes-->
<Sample>
<!--There ARE EXACTLY 22 child nodes for each <Sample> parent node, one <SampleID> and 21 <Location>-->
<SampleID>0000001A</SampleID>
<!--There ARE EXACTLY 3 child nodes for each <Location> parent node, on <LocationID> and two <Foo>-->
<Location>
<LocationName>Jeff</LocationName>
<Foo1>10</Foo1>
<Foo2>11</Foo2>
</Location>
<Location>
<LocationName>Jenn</LocationName>
<Foo1>11</Foo1>
<Foo2>12</Foo2>
</Location>
</Sample>
<Sample>
<SampleID>0000002A</SampleID>
<Location>
<LocationName>Greg</LocationName>
<Foo1>13</Foo1>
<Foo2>14</Foo2>
</Location>
<Location>
<LocationName>Anne</LocationName>
<Foo1>14</Foo1>
<Foo2>16</Foo2>
</Location>
</Sample>
</MyXMLFile>
I want to convert the xData
column from xTable
and put it into this table (ID column for illustration only):
ID SampleID LocationName Foo1 Foo2
1 00000001 Jeff 10 11
2 00000001 Jenn 11 12
… 00000001 … … …
22 00000001 … … …
23 00000002 Greg 13 14
24 00000002 Anne 17 18
… 00000002 … … …
44 00000002 … … …
At the moment, I'm just trying to SELECT
the xData
column from xTable
and will edit the query later to insert the data. So my first query, just to show that <SampleID>
does get selected:
Query 1
SELECT a.b.query('SampleID').value('.', 'varchar(20)') AS SampleID
FROM xTable
CROSS APPLY xData.nodes('MyXMLFile/Sample') as a(b)
The output looks good:
ID SampleID
1 00000001
2 00000002
So, I added to the query:
Query2
SELECT a.b.query('SampleID').value('.', 'varchar(20)') AS SampleID,
a.b.query('LocationName').value('.', 'varchar(10)') AS LocationName,
a.b.query('Foo1').value('.', 'varchar(6)') AS Foo1,
a.b.query('Foo2').value('.', 'varchar(6)') AS Foo2
FROM xTable
CROSS APPLY xData.nodes('MyXMLFile/Sample/SampleID/../Location') as a(b)
For this output, no data gets selected for <SampleID>
. This is not surprising to me as the xpath selection is only to the <Location>
parent node and returns its children <LocationName>
, <Foo1>
and <Foo2>
and not <SampleID>
.
ID SampleID LocationName Foo1 Foo2
1 Jeff 10 11
2 Jenn 11 12
… … … …
22 … … …
23 Greg 13 14
24 Anne 17 18
… … … …
44 … … …
So then I tried this:
Query 3
SELECT a.b.query('SampleID').value('.', 'varchar(20)') AS SampleID,
c.d.query('LocationName').value('.', 'varchar(10)') AS LocationName,
c.d.query('Foo1').value('.', 'varchar(6)') AS Foo1,
c.d.query('Foo2').value('.', 'varchar(6)') AS Foo2
FROM xTable
CROSS APPLY xData.nodes('MyXMLFile/Sample/SampleID') as a(b)
CROSS APPLY xData.nodes('MyXMLFile/Sample/SampleID/../Location') as c(d)
The output is a little better but the rows are duplicated in the table. There should only be 44, but there are 88:
ID SampleID LocationName Foo1 Foo2
1 00000001 Jeff 10 11
2 00000001 Jenn 11 12
… 00000001 … … …
42 00000001 … … …
43 00000001 … … …
44 00000001 … … …
45 00000002 Greg 13 14
46 00000002 Anne 17 18
… … … … …
88 00000002 … … …
Then I thought I would try a different way.
Query 4
DECLARE @x xml;
SELECT @x = xData
FROM xTable
SELECT a.b.value('(SampleID/text())[1]', 'varchar(20)') AS SampleID,
a.b.value('(LocationName/text())[1]', 'varchar(10)') AS LocationName,
a.b.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
a.b.value('(Foo2/text())[1]', 'varchar(6)') AS Foo2
FROM @x.nodes('MyXMLFile/Sample') AS xData(a)
CROSS APPLY @x.nodes('MyXMLFile/Sample/SampleID/../Location') AS a(b)
Now, instead of blank SampleID
field or duplicated records, SampleID
came back NULL
and the data was duplicated:
ID SampleID LocationName Foo1 Foo2
1 NULL Jeff 10 11
2 NULL Jenn 11 12
… NULL … … …
42 NULL … … …
43 NULL … … …
44 NULL … … …
45 NULL Greg 13 14
46 NULL Anne 17 18
… NULL … … …
88 NULL … … …
So in a final attempt to select the right data, I tried this query:
Query 5
DECLARE @x xml;
SELECT @x = xData
FROM xTable
SELECT a.b.value('(SampleID/text())[1]', 'varchar(20)') AS SampleID,
c.d.value('(LocationName/text())[1]', 'varchar(10)') AS LocationName,
c.d.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
c.d.value('(Foo2/text())[1]', 'varchar(6)') AS Foo2
FROM @x.nodes('MyXMLFile/Sample') AS xData(a)
CROSS APPLY @x.nodes('MyXMLFile/Sample') AS a(b)
CROSS APPLY @x.nodes('MyXMLFile/Sample/SampleID/../Location') AS c(d)
The result here is even more surprising to me, not only did the query populate all the fields but it
quadrupled the output:
ID SampleID LocationName Foo1 Foo2
1 00000001 Jeff 10 11
2 00000001 Jenn 11 12
… 00000001 … … …
… 00000001 … … …
… 00000001 … … …
44 00000001 … … …
45 00000002 Greg 13 14
46 00000002 Anne 17 18
47 00000002 … … …
48 00000002 … … …
… … … … …
176 00000002 … … …
I understand my problem to be incorporation of the two different xpaths into the query and my understanding and use of the derived tables in the query. Any help would be appreciated. How can I adjust these queries to get the table I need?
Thanks in advance.
EDIT:
At the advice of the answer of David Browne this works for me:
Query 6
INSERT INTO MyTable (SampleID, LocationName, Foo1, Foo2)
SELECT Sample.n.value('(SampleID)[1]', 'varchar(20)') AS SampleName,
Location.n.value('(LocationName/text())[1]', 'varchar(1)') AS LocationName,
Location.n.value('(Foo1/text())[1]', 'varchar(6)') AS Foo1,
Location.n.value('(Foo2/text())[1]', 'varchar(6)') As Foo2
FROM xTable AS x
CROSS APPLY x.xData.nodes('/MYXMLFile/Sample') AS Sample(n)
CROSS APPLY Sample.n.nodes('Location') AS Location(n)
Best Answer
The pattern is that each
cross apply
picks up the relative location of the parent. Try something like this:outputs