I have a legacy data source column that is delimitted by semicolons and commas. The first semicolon indicates the last name, the second indicates the first and middle name (or initials), and the last semicolon indicates the type of individual. The comma indicates that a new name has began. Here is a sample of this data.
+-------+---------------------------------------------------------------------------------------------------------------------+
| ID | SOURCE |
+-------+---------------------------------------------------------------------------------------------------------------------+
| 62963 | RENZ;MICHAEL;DECEASED,WANDER;MARIA;MINOR,WANDER;HENRY RUDOLPH;MINOR,WANDER;ROSA;MINOR,WANDER;PAUL EMIL;MINOR |
| 62964 | HERNDON;A C;ESTATE,BERRING;A F;DECEASED,BEIRING;A F;DECEASED,BEIRING;ANDREAS FREDERICK;DECEASED |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED |
| 62965 | ZINCH;;ESTATE,ZINTZ;;ESTATE,HAYNES;HENRY;DECEASED |
| 62966 | KRAUS;JOSEPHINE;MINOR,KENNEDY;GEORGE;DECEASED |
| 62967 | CAREY;JAMES;ESTATE,DE LA GARZA;REFUGIO;DECEASED |
| 62968 | LEWIS;FLORENCE;ESTATE,LOCKWOOD;ALBERT A;DECEASED |
| 62969 | GLAESER;EMMA;MINOR,GLAESER;HERMAN JR;MINOR,GLAESER;HERMAN;MINOR,RODRIGUEZ;HILARIO;DECEASED,RODRIGUEZ;MARIE;DECEASED |
| 62970 | STORY;BETTIE;ESTATE,EIGENDORFF;FRANZ;DECEASED |
| 62971 | HOWELL;MAMIE;MINOR,HOWELL;ETHEL;MINOR |
+-------+---------------------------------------------------------------------------------------------------------------------+
I am attempting to pull the data in a manner such as this so that it can be adapted to a different schema:
+-----------+------------+-------------+-------------------+----------+
| ID | SEQUENCE | LAST | FIRSTMIDDLE | TYPE |
+-----------+------------+-------------+-------------------+----------+
| 62963 | 1 | RENZ | MICHAEL | DECEASED |
| 62963 | 2 | WANDER | MARIA | MINOR |
| 62963 | 3 | WANDER | HENRY RUDOLPH | MINOR |
| 62963 | 4 | WANDER | ROSA | MINOR |
| 62963 | 5 | WANDER | PAUL EMIL | MINOR |
| 62964 | 1 | HERNDON | A C | ESTATE |
| 62964 | 2 | BERRING | A F | DECEASED |
| 62964 | 3 | BEIRING | A F | DECEASED |
| 62964 | 4 | BEIRING | ANDREAS FREDERICK | DECEASED |
| 62965 | 1 | ZINCH | | ESTATE |
| 62965 | 2 | ZINTZ | | ESTATE |
| 62965 | 3 | HAYNES | HENRY | DECEASED |
| 62966 | 1 | KRAUS | JOSEPHINE | MINOR |
| 62966 | 2 | KENNEDY | GEORGE | DECEASED |
| 62967 | 1 | CAREY | JAMES | ESTATE |
| 62967 | 2 | DE LA GARZA | REFUGIO | DECEASED |
| 62968 | 1 | LEWIS | FLORENCE | ESTATE |
| 62968 | 2 | LOCKWOOD | ALBERT A | DECEASED |
| 62969 | 1 | GLAESER | EMMA | MINOR |
| 62969 | 2 | GLAESER | HERMAN JR | MINOR |
| 62969 | 3 | GLAESER | HERMAN | MINOR |
| 62969 | 4 | RODRIGUEZ | HILARIO | DECEASED |
| 62969 | 5 | RODRIGUEZ | MARIE | DECEASED |
| 62970 | 1 | STORY | BETTIE | ESTATE |
| 62970 | 2 | EIGENDORFF | FRANZ | DECEASED |
| 62971 | 1 | HOWELL | MAMIE | MINOR |
| 62971 | 2 | HOWELL | ETHEL | MINOR |
+-----------+------------+-------------+-------------------+----------+
This type of data extraction is something I am not all too familiar with. I am thinking I need to use a complex combination of SUBSTRING
and CHARINDEX
, but given that the number of entries that the source column can contain varies, I am not sure how best to approach this. Any guidance on where I should begin would be incredibly helpful.
Best Answer
As long as the legacy data is as described and does not have any odd variations, the following should work as it produces the desired output.
Notes:
WHILE
loop).Test setup:
Main query: