Sql-server – SQL Split Address Line into Separate Columns

querysql serversql-server-2016string-splittingsubstring

Whats the best way to split a comma list column into different columns? I am looking for a simple algorithm, heard it is best to avoid scalar value functions, as they might be slow. Searching into many methods, substring, find, splitstring; we need to run this algorithm across millions of addresses, so looking for best, optimal answer with good coding practice.

create table dbo.AddressTest (AddressLine varchar(255))

insert into dbo.AddressTest (AddressLine) 
values ('123 Maple Street, Austin, Texas, 78653, 555-234-4589')

-- may not require substring, just looking for good way
select 
    substring  as AddressStreet   -- Expected: 123 Maple Street
    substring(...) as City,       -- Expected: Austin
    substring(...) as State,     -- Expected: Texas
    substring(...) as ZipCode     -- Expected: 78653
    substring(...) as PhoneNumber -- Expected: 555-234-4589
from dbo.AddressTest

looking into this method also:
https://stackoverflow.com/questions/19449492/using-t-sql-return-nth-delimited-element-from-a-string

Best Answer

STRING_SPLIT() is probably best in SQL Server 2016. I've tested against 1 million rows and returns results in 12 seconds (fairly contrived test).

You can use the PIVOT operator to produce the columns from your rows following the split as there are a known number of elements.

Code to setup

CREATE TABLE #Table (ID INT IDENTITY, StreetAddress VARCHAR(MAX))

DECLARE @I INT = 1000000

WHILE (@I) > 0
BEGIN
       INSERT INTO #Table (StreetAddress) VALUES ('123 Fake St, BigCity, My State, 12345')
       SET @I = @I -1
END

Code to retrieve split address elements

SELECT ID,
       [1] AS Street,
       [2] AS City, 
       [3] AS State, 
       [4] AS Code
FROM 
(
       SELECT ID, value, ROW_NUMBER() OVER (PARTITION BY ID ORDER BY ID) AS Rn
       FROM #Table
       CROSS APPLY STRING_SPLIT(REPLACE(StreetAddress, ', ', ','), ',')
) src
PIVOT
(
       MAX(Value) FOR Rn IN ([1], [2], [3], [4])
) pvt

You can probably improve performance by eliminating the REPLACE in STRING_SPLIT, but you'll then need to TRIM your columns to remove whitespace.