SQL Server – Manipulating Column Containing Key/Value Pairs

enterprise-editionjoin;sql serversql-server-2016view

I'm accessing and creating reports from a vendor via a replicated SQL Server database. They've done some absolutely insane things that I've been trying to solve for, but this one takes the cake.

They have a table that has many standard columns. But this table also has a column called "Data". The column is a legacy "text" data type, and it contains a giant (hundreds) list of key/value pairs. Each pair is separated by a CRLF, and the key and value are separated by an equal sign. Example:

select myTable.[data] from myTable where tblKey = 123

Result:

Key 1=Value 1
Key 2=Value 2
Key 3=Value 3
...
Key 500=Value 500

I'm trying to determine the most efficient way to break that column out into a usable table of data. The end goal would be to be able to query the table in a way that returns the table key along with specified key/values as column/fields as such:

tblKey | [Key 1] | [Key 3] | [Key 243]
-------|---------|---------|-----------
 123     Value 1   Value 3   Value 243
 124     Value 1   Value 3   Value 243
 125     Value 1   Value 3   Value 243

Is there a way to mold that column into a View? I can't imagine that a Function would be particularly efficient, but I'm sure I could parse things out that way using a string_split or something of that sort. Has anyone run into this type of atrocity before and found a good way to manipulate it into usable data?

Edit to add dbfiddle sample data.

The data is replicated from a vendor's source, so I can't create new tables. I can create views, procedures and functions. That's what I'm looking for advice for a decent way to accomplish.

Best Answer

UPDATE

If as you posted in your own answer you're able to use a UDF to get specific key values let me suggest this one: (You don't need to split all key/values and you don't need to read the table again, you can get it by using text functions.)

CREATE FUNCTION fnGetKey(@Data text, @Key varchar(20))
RETURNS varchar(100)
AS
BEGIN

  RETURN
  (
  SELECT 
      SUBSTRING (
                  @Data,
                  /* Position of first '=' after key + 1 */
                  CHARINDEX('=', @Data, PATINDEX('%' + @key + '%', @Data)) + 1,
                  /* Lenght, Position of first chr(13) after key less previuos value - 1 */
                  (CHARINDEX(CHAR(13), @Data, PATINDEX('%' + @key + '%', @Data)) 
                  - 
                  CHARINDEX('=', @Data, PATINDEX('%' + @key + '%', @Data))) - 1
                )
  )

END

SELECT
    FruitID, Name, Description,
    dbo.fnGetKey([Data], 'key 2') as [key 2],
    dbo.fnGetKey([Data], 'key 4') as [key 4]
FROM
    [Fruit];

FruitID | Name   | Description | key 2   | key 4  
------: | :----- | :---------- | :------ | :------
      1 | Banana | Delicious   | value 2 | value 4
      2 | Pear   | Rotton      | value 2 | value 4
      3 | Kiwi   | Okay        | value 2 | value 4

db<>fiddle here

Original answer

The only solution I can figure out is by splitting key/values and then pivot it to obtain the desired result.

Unfortunately there are some inconveniences:

STRING_SPLIT doesn't works with text columns. Hence you must cast it to varchar before you are able to manipulate it.
STRING_SPLIT requires a nchar(1) or nvarchar(1), ergo you should replace CHAR(3)+CHAR(10) by a single character.
Aggregate function on PIVOT works better with numeric values, then you should cast Value to some numeric data type.
PIVOT needs a well-known number of columns, in my example I have used a few of them but you should write the whole sequence unless you'd rather deal with dynamic queries.

This is what I've got using your sample data:

WITH KP AS
(
    SELECT FruitID, Name, Description, value as KPair
    FROM   Fruit
    CROSS APPLY STRING_SPLIT(REPLACE(CAST(Data AS varchar(max)), CHAR(13)+CHAR(10), ','), ',') /* STRING_SPLIT only allows nchar(1),  varchar(1) */
)
, KP1 AS
(
  SELECT
      FruitID,  
      SUBSTRING(KPair, 5, CHARINDEX('=', KPair) - 5) AS [Key],
      SUBSTRING(KPair, CHARINDEX('=', KPair) + 7, LEN(KPair) - CHARINDEX('=', KPair) - 6) AS [Value]
  FROM
      KP
)
SELECT [FruitID], [1],[2],[3],[4],[5]
FROM   KP1
PIVOT (MAX([Value]) FOR [Key] IN ([1],[2],[3],[4],[5])) AS PVT;

First CTE split every Key X=Value Y. The second one cut this value to obtain each [Key] and [Value]. And the final PIVOT compose the final result in columns.

FruitID | 1  | 2  | 3  | 4  | 5 
------: | :- | :- | :- | :- | :-
      1 | 1  | 2  | 3  | 4  | 5 
      2 | 1  | 2  | 3  | 4  | 5 
      3 | 1  | 2  | 3  | 4  | 5

db<>fiddle here

NOTE: I'm not sure if I should maintain [Key 1] & [Value 1] or it should be converted as a column named [Key] & [Value].

A different approach

When I work with 3rd party databases I usually add a new database, on the same server/instance if possible, and then I use it for my own purposes, just to avoid conflicts with the DB owners.

In this case you could add a new table and periodically throw a process to update it with the new values.

You could use a table with all columns:

CREATE TABLE [FruitKeys]
(
    [FruitID] int NOT NULL PRIMARY KEY,
    [V1]      int NULL,
    [V2]      int NULL,
    [V3]      int NULL,
    [V4]      int NULL,
    [V5]      int NULL
);

or a table with Key/Value pairs and use a pivot to obtain the final result:

CREATE TABLE [FruitKeys]
(
    [FruitID] int NOT NULL,
    [Key]     int NOT NULL,
    [Value]   int NOT NULL,
    CONSTRAINT [PK_FruitKeys] PRIMARY KEY ([FruitID], [Key])
);

Related Solutions

Sql-server – Search on part of a column in a SQL Server cross-database view

Are you allowed to replicate that table in another database? Using transactional replication, for latest data access. If yes, then I'd test a more cumbersome process like replicating the needed table, building another table or materialized view with a computed column as you need.

If replication is not an option, I'd say that you need to establish a scheduled process to take that table in your database and continue locally to process the data. Exactly like the first suggestion, but using your own scripts and schedule.

That, or I think you're stick to the scalar functions that aren't really the best option from performance point of view.

SQL Server Update Column with Datetime – How to Use Row Values

It looks like you have at least two different regional date formats there, possibly USA and Europe? You could just split the updates by languages...

something like this, which uses the language setting to control isdate and convert.

set language english;
update test_dates 
set ToDateTime = convert(datetime,ContainedData)
where AID='123' and PID='2' and isdate(ContainedData)=1;

set language british;
update test_dates 
set ToDateTime = convert(datetime,ContainedData)
where AID='123' and PID='2' and isdate(ContainedData)=1;

If the table is too large to update in big chunks, then you should look at wrapping that in a loop with a row limit and a transaction around each update and repeat until zero rows have been updated.

Best Answer

Related Solutions

Sql-server – Search on part of a column in a SQL Server cross-database view

SQL Server Update Column with Datetime – How to Use Row Values

Related Question