I am trying to split two columns with delimited strings into rows. The positions of the values in each string are related so I am trying to split it out so that the related values are in a row. I am unable to use function as I cannot create objects in the database
Here is sample table and data
CREATE TABLE #temp
(id INT,
keys VARCHAR(50),
vals VARCHAR(50)
);
INSERT INTO #temp
VALUES
(1, '1,2,3', 'one,two,three'),
(2, '4,5,6', 'four,five,six'),
(3, '7,8,9', 'seven,eight,nine');
and my desired output would be
ID key val
1 1 one
1 2 two
1 3 three
2 4 four
2 5 five
2 6 six
3 7 seven
3 8 eight
3 9 nine
I got the query to work if I only split one column, so I define two CTEs with row_number and join on ID and row_number. This does give desired output but my live table is very large and I was hoping for a way to pass through the table only once, instead of twice.
with keys as(
SELECT id,keys,vals,
keys.keyid.value('.', 'VARCHAR(8000)') AS keyid,
row_number() over(order by (select null)) as rn
FROM
(SELECT id,keys,vals,
CAST('<Keys><key>'+REPLACE(keys, ',', '</key><key>')+'</key></Keys>' AS XML) AS tempkeys
FROM #temp
) AS temp
CROSS APPLY tempkeys.nodes('/Keys/key') AS keys(keyid)),
vals as(
SELECT id,keys,vals,
vals.val.value('.', 'VARCHAR(8000)') AS valid,
row_number() over(order by (select null)) as rn
FROM
(SELECT id,keys,vals,
CAST('<vals><val>'+REPLACE(vals, ',', '</val><val>')+'</val></vals>' AS XML) AS tempvals
FROM #temp
) AS temp
CROSS APPLY tempvals.nodes('/vals/val') AS vals(val))
SELECT k.id, k.keyid, v.valid
FROM keys AS k
INNER JOIN vals AS v
ON k.id = v.id
AND k.rn = v.rn;
Best Answer
Create the function in
msdb
or somewhere else.Then, as @gbn noted, reference it by 3-part name wherever your query has to run.
Results:
The resulting plan, shown in Plan Explorer (disclaimer: I'm the Product Manager), is not the prettiest thing I've ever seen (click to enlarge a little bit):
But there is exactly one scan of #temp (4% cost). The biggest costs are two sorts and a spool, and there is some I/O due to a worktable which I am not sure is avoidable.
If you KNOW you will only ever have 50 characters in either of these strings, then you can get a much simpler plan with a built-in
Numbers
table (people object to these, but they're very useful, and they are almost always in memory if you reference them enough). This doesn't help I/O but removing the recursive CTE and other constructs of building the numbers inside the function is quite helpful for CPU etc.First, the numbers table:
Then a second version of the function:
Here is the simpler plan that results (again, click to enlarge):
The plan still has two sort operations, but the spool is gone, there is still only one scan of
#temp
, and in my limited tests the cost numbers (absolute cost numbers, not %) were better every time.I don't know precisely either of these will scale with a lot more rows, but it's worth testing, and if you weigh this against other solutions and it can't scale well, that may be a point your reconsider the design (store these relationally instead of as comma-separated sets).