Find columns with different values compared with a reference row

sap-hana

My table, MARC, has more than 200 columns with WERKS (plant) as the key.

Most of the entries in MARC have values that come from a reference plant. So, usually, all the 200 columns will contain the same values.

I need to find the columns that have different values compared with the reference plant row, in order to find the deviations and show the names of the columns that had different values.

The main question is: how to do the select when it is not known which columns need to be selected.

Example

In the above, ABAS is the reference plant.

I expect only the first row to be returned, because only that row contains any differences compared with the reference row. Differences in that row occur in the following columns:

WERKS
PSTAT
DISMM
DISPO

The returned row should contain only those four columns.

I do not need any other columns to be able to show on the UI that these 3 columns have different values for that particular WERKS.

I am using SAP HANA.

Best Answer

This answer shows an implementation of the techniques mentioned by Michael Green in comments to the question. It uses SQL Server, but the basic idea is applicable to any SQL-compatible database engine that supports dynamic SQL. Support for UNPIVOT and PIVOT is not required.

The broad idea is:

Convert rows to key-value pairs
Compare with the reference row to identify differences
Convert the differences to one row per WERKS using dynamic SQL

It is the use of dynamic SQL that gets around the issue of not knowing which columns to SELECT - they are determined dynamically.

Sample table and data

CREATE TABLE #MARC
(
    c1 integer PRIMARY KEY,
    c2 integer NULL,
    c3 integer NULL,
    c4 integer NULL,
    c5 integer NULL
);

INSERT #MARC
    (c1,c2,c3,c4,c5)
VALUES
    (1, 2, 3, 4, 5), -- The reference row
    (2, 2, 3, 4, 5),
    (3, 2, 3, 4, 5),
    (4, 2, 3, 7, 9), -- Differences: 7 in c4; 9 in c5 
    (5, 6, 3, 4, 8); -- Differences: 6 in c2; 8 in c5

Unpivot and find differences

-- Holds differences found (WERKS keys and column key-value pairs)
CREATE TABLE #Differences
(
    c1 integer NOT NULL,
    name varchar(2) NOT NULL,
    value integer NOT NULL
);

WITH 
    Ref AS
    (
        -- Unpivoted reference row
        SELECT 
            M.c1, 
            CA.name,
            CA.value
        FROM #MARC AS M
        CROSS APPLY 
        (
            VALUES
                ('c2', M.c2),
                ('c3', M.c3),
                ('c4', M.c4),
                ('c5', M.c5)
        ) AS CA (name, value)
        WHERE
            M.c1 = 1
    ), 
    NonRef AS
    (
        -- Unpivoted non-reference rows
        SELECT 
            M.c1, 
            CA.name,
            CA.value
        FROM #MARC AS M
        CROSS APPLY 
        (
            VALUES
                ('c2', M.c2),
                ('c3', M.c3),
                ('c4', M.c4),
                ('c5', M.c5)
        ) AS CA (name, value)
        WHERE
            M.c1 <> 1
    ), 
    Differences AS
    (
        -- Find column value differences
        SELECT 
            NonRef.c1, 
            NonRef.name, 
            NonRef.value
        FROM NonRef
        JOIN Ref
            ON Ref.name = NonRef.name      -- Same column name
            AND Ref.value <> NonRef.value  -- Different value
    )
INSERT #Differences
    (c1, name, value)
SELECT
    D.c1,
    D.name,
    D.value
FROM Differences AS D;

Differences

-- Show differences for illustrative purposes
SELECT
    D.c1,
    D.name,
    D.value
FROM #Differences AS D;

╔════╦══════╦═══════╗
║ c1 ║ name ║ value ║
╠════╬══════╬═══════╣
║  4 ║ c4   ║     7 ║
║  4 ║ c5   ║     9 ║
║  5 ║ c2   ║     6 ║
║  5 ║ c5   ║     8 ║
╚════╩══════╩═══════╝

Dynamic SQL

-- Header
DECLARE @sql varchar(max) =
    'SELECT D.c1' + CHAR(13);

-- Add aggregation
WITH D AS
(
    SELECT DISTINCT D2.name
    FROM #Differences AS D2
)
SELECT @sql +=
    (
        SELECT 
            ', MAX(CASE WHEN D.name = ' + CHAR(39) + D.name + CHAR(39) + 
            ' THEN D.value END) AS ' + QUOTENAME(D.name) + CHAR(13) AS [text()]
        FROM D
        FOR XML PATH (''), TYPE
    ).value('(./text())[1]', 'varchar(max)')

-- Footer    
SET @sql +=
    'FROM #Differences AS D' + CHAR(13) +
    'GROUP BY D.c1;'

The constructed SQL statement in @sql is:

SELECT D.c1
, MAX(CASE WHEN D.name = 'c2' THEN D.value END) AS [c2]
, MAX(CASE WHEN D.name = 'c4' THEN D.value END) AS [c4]
, MAX(CASE WHEN D.name = 'c5' THEN D.value END) AS [c5]
FROM #Differences AS D
GROUP BY D.c1;

That is the manual PIVOT syntax. The final result is obtained by executing the SQL:

EXECUTE (@sql);

Results:

╔════╦══════╦══════╦════╗
║ c1 ║  c2  ║  c4  ║ c5 ║
╠════╬══════╬══════╬════╣
║  4 ║ NULL ║ 7    ║  9 ║
║  5 ║ 6    ║ NULL ║  8 ║
╚════╩══════╩══════╩════╝

This is the minimal set of columns describing the column differences with respect to the reference row, as required.

-- Clean up temporary tables
DROP TABLE 
    #Differences,
    #MARC;

There is some SQL Server-specific syntax above, but the manual UNPIVOT can be achieved in other ways if APPLY or LATERAL is not supported e.g. with a cross join to a suitable identity matrix table. Generating the dynamic SQL uses FOR XML PATH as an efficient way to concatenate the strings, but again this can be done in any language.

Stack Exchange Data Explorer Demo