Sql-server – Syntax to find text in string, then pull out following text until a comma is identified

sql server

I am trying to write a query that will extract partial information out of a longer string, but the string is not always consistent so I need to identify certain parts and extract based on that.

The string is legal description info on property parcels and is structured like this: "Lot D, Plan VIP12345, Section 70, Suchandsuch District". I am trying to get the "VIP12345" part as a separate field.

Initially I considered exporting this field as a CSV table and then separate the info into unique sections that way, but this is a dataset that gets updated regularly by replacing the entire dataset, so every time it is updated I would have to export the table, make the field a CSV table, reimport that into separate fields and then export that into something my GIS can join to.

I would rather have my GIS link to the original table with an SQL query to sort out where the Plan# is (the "VIP12345" part). However, the Plan# does not always start with VIP, sometimes it has no letters at start, sometimes it is VIS, sometimes EPP. The Plan# is also not always the same number of digits, sometimes it is 4 sometimes 5, sometimes 6, so i can't just count # of characters to extract from a certain point.

So I am thinking I need an SQL query that in plain language would do this: In field [FIXLGL] Look for the string 'Plan ' then extract all characters following that until you see a ','

I am having troubles figuring out what SQL acrobatics to use to accomplish this. I can identify records that have a plan# ([FIXLGL] LIKE '%Plan %') but then how to I tell the query that the important part is what is after that?

Best Answer

If you're using SQL Server, you can use this.

If you're not, I'll delete my answer.

DECLARE @s NVARCHAR(100) = N'Lot D, Plan VIP12345, Section 70, Suchandsuch District';

SELECT @s,
        SUBSTRING(@s AS string, 
                 CHARINDEX('VIP', @s),
                 CHARINDEX(',', @s, CHARINDEX('VIP', @s))
                 - CHARINDEX('VIP', @s)
                 ) AS sub

How does this work?

We can use substring and the results of charindex to get just the parts of the string we're interested in.

Charindex is a better choice here than Patindex because it accepts a 3rd argument. The third argument tells charindex at which point in the string to start searching.

        SUBSTRING(@s, --The string
                 CHARINDEX('VIP', @s), --Charindex of VIP
                 CHARINDEX(',', @s, CHARINDEX('VIP', @s)) --Charindex of the first comma _after_ VIP
                 - CHARINDEX('VIP', @s) --Subtract the position of VIP
                 )

Which gives us:

string                                                  sub
Lot D, Plan VIP12345, Section 70, Suchandsuch District  VIP12345

Hope this helps!