SQL Server – Max Value from Multiple Columns in Computed Column

computed-columnsql serversql-server-2017stored-procedures

In this article solution 1, it talks about finding the maximum value from many columns. I would like to conduct this in a computed/persisted column. How would I do this?

https://www.mssqltips.com/sqlservertip/4067/find-max-value-from-multiple-columns-in-a-sql-server-table/

create table dbo.TestAmount
(
    Amount1 int,
    Amount2 int,
    Amount3 int,
    MaxValuedata as (select MAX(MaxAmount) FROM (VALUES (Amount1),(Amount2),(Amount3)) AS MaxAmount(LastAmount)) 
)

May have 10 values in the future, trying to prevent long case statement.

Best Answer

First thing... I'm not advocating THAT you do this... I'm simply showing you HOW to do this. If your table experiences high volume inserts and/or updates, you could see a noticeable performance hit. Using scalar UDFs in computed columns will force all queries against the table to run serially.

Start by creating a scalar function similar to the following...

CREATE FUNCTION dbo.GreatestOfThreeInts
/* ================================================================================================
Scalar function created for the sole purpose of calculating the MaxVal computed column on dbo.Test.
================================================================================================ */
(
    @C1 INT,
    @C2 INT,
    @C3 INT
)
RETURNS INT WITH SCHEMABINDING --, RETURNS NULL ON NULL INPUT --<< use this if NULLs are a possibility..
AS 
BEGIN 
    DECLARE @MaxVal INT = ( SELECT MAX(x.Val) FROM ( VALUES (@C1), (@C2), (@C3) ) x (Val) );
    RETURN ISNULL(@MaxVal, 0);
END;
GO

Either create your table with a PERSISTED computed column or, if the table already exists, use the ALTER / ADD syntax to add the PERSISTED computed column...

CREATE TABLE dbo.Test (
    C1 INT NOT NULL,
    C2 INT NOT NULL,
    c3 INT NOT NULL,
    MaxVal AS dbo.GreatestOfThreeInts(C1, C2, C3) PERSISTED NOT NULL    -- persist the value so that it doesn't need to be constantly recomputed
    );
GO

CREATE NONCLUSTERED INDEX ix_Test_MaxVal ON dbo.Test (MaxVal) INCLUDE (C1, C2, c3);
GO

Why do I keep saying PERSISTED?... Buy once, cry once... Unless you have a very write heavy usage pattern, you'll be better off computing the values on inserts & updates, than every time you reference the column in a select... Especially if that column is going to be used in a predicate or sorting operation.

Sooo... Let's see it in action...

INSERT dbo.Test (C1, C2, c3) VALUES
    (123,456,789),
    (345,478,123),
    (523,321,852),
    (111,471,951),
    (874,320,357),
    (965,102,478);
GO 

SELECT * FROM dbo.Test t ORDER BY t.MaxVal OPTION(QUERYTRACEON 176);
GO 

SELECT * FROM dbo.Test t WHERE t.MaxVal >= 800 AND t.MaxVal < 900 OPTION(QUERYTRACEON 176);
GO

Results...

C1          C2          c3          MaxVal
----------- ----------- ----------- -----------
345         478         123         478
123         456         789         789
523         321         852         852
874         320         357         874
111         471         951         951
965         102         478         965


C1          C2          c3          MaxVal
----------- ----------- ----------- -----------
523         321         852         852
874         320         357         874

Hope this helps, Jason

Edit #1: A BIG THANK YOU to Erik for adding the link, pointing out the fact that using a scalar UDF to compute a column will prevent the optimizer from considering a parallel execution plan... Even when the computed column is persisted. A fact that I actually knew but completely omitted from my initial answer. What I didn't know is the OPTION(QUERYTRACEON 176) thing... Picking up that little nugget, more than covered the cost of admission for me!

Edit #2: Without inviting the religious debate of "NULL vs NOT NULL" column constraints, I'll simply state that my personal "default" is make all columns NOT NULL unless there is a compelling reason to do otherwise... That said, @MartinSmith makes some good points... Including the fact that the OP, by not specifying NULLability, made all columns NULLable. Plus, after the back & forth, I was just curious to see if the RETURN ISNULL(@MaxVal, 0); was doing anything other than irritating people reading the T_SQL... Short answer... It does not.

The following includes the introduction of a "control" table (no computed column) and NULLable versions of dbo.GreatestOfThreeInts & dbo.Test (dbo.GreatestOfThreeInts_2 & dbo.Test_2)

CREATE FUNCTION dbo.GreatestOfThreeInts_2
/* ==================================================================================================
Scalar function created for the sole purpose of calculating the MaxVal computed column on dbo.Test_2.
================================================================================================== */
(
    @C1 INT,
    @C2 INT,
    @C3 INT
)
RETURNS INT WITH SCHEMABINDING
AS 
BEGIN 
    DECLARE @MaxVal INT = ( SELECT MAX(x.Val) FROM ( VALUES (@C1), (@C2), (@C3) ) x (Val) );
    RETURN @MaxVal;
END;
GO

CREATE TABLE dbo.Test_2 (
    C1 INT NULL,
    C2 INT NULL,
    c3 INT NULL,
    MaxVal AS dbo.GreatestOfThreeInts_2(C1, C2, C3) PERSISTED   -- persist the value so that it doesn't need to be constantly recomputed
    );
GO

CREATE NONCLUSTERED INDEX ix_Test2_MaxVal ON dbo.Test_2 (MaxVal) INCLUDE (C1, C2, c3);
GO

CREATE TABLE dbo.Control (
    C1 int NOT NULL,
    C2 int NOT NULL,
    c3 int NOT NULL,
    MaxVal INT NOT NULL
    );
GO

CREATE NONCLUSTERED INDEX ix_Control_MaxVal ON dbo.Control (MaxVal) INCLUDE (C1, C2, c3);
GO

And because the 6 rows in my original answer isn't much of a test, the following will load all 3 tables with 1 million rows of test data...

-- clear out any existing data...
TRUNCATE TABLE dbo.Test;
GO 
TRUNCATE TABLE dbo.Test_2;
GO 
TRUNCATE TABLE dbo.Control;
GO 

-- add 1M rows of test data...
WITH 
    cte_n1 (n) AS (SELECT 1 FROM (VALUES (1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) n (n)), 
    cte_n2 (n) AS (SELECT 1 FROM cte_n1 a CROSS JOIN cte_n1 b),
    cte_n3 (n) AS (SELECT 1 FROM cte_n2 a CROSS JOIN cte_n2 b),
    cte_Tally (c1, c2, c3) AS (
        SELECT TOP (1000000)
            ABS(CHECKSUM(NEWID())) % 9000 + 1000,   -- randomly generate INTs between 1000 and 9999
            ABS(CHECKSUM(NEWID())) % 9000 + 1000,   -- no, I don't have an actual reason for using that specific range...
            ABS(CHECKSUM(NEWID())) % 9000 + 1000    -- 
        FROM
            cte_n3 a CROSS JOIN cte_n3 b
        )
INSERT dbo.Test (C1, C2, c3)
SELECT 
    t.c1, 
    t.c2, 
    t.c3
FROM
    cte_Tally t;
GO

-- use dbo.Test to insert dbo.Test_2 & dbo.Control so all 3 tables will have the exact same data values...
-- (to compare actual insert performance, use the cte_Tally to load all tables)

INSERT dbo.Test_2 (C1, C2, c3)
SELECT 
    t.C1, t.C2, t.c3
FROM
    dbo.Test t;
GO

INSERT dbo.Control (C1, C2, c3, MaxVal)
SELECT 
    t.C1, t.C2, t.c3, t.MaxVal
FROM
    dbo.Test t;
GO

Explanation

The real question is why the optimizer felt the need to retrieve A, B, and C for the index seek at all. We would expect it to read the Comp column using a nonclustered index scan, and then perform a seek on the same index (alias T2) to locate the Top 1 record.

The query optimizer expands computed column references before optimization begins, to give it a chance to assess the costs of various query plans. For some queries, expanding the definition of a computed column allows the optimizer to find more efficient plans.

When the optimizer encounters a correlated subquery, it attempts to 'unroll it' to a form it finds easier to reason about. If it cannot find a more effective simplification, it resorts to rewriting the correlated subquery as an apply (a correlated join):

Apply rewrite

It just so happens that this apply unrolling puts the logical query tree into a form that does not work well with project normalization (a later stage that looks to match general expressions to computed columns, among other things).

In your case, the way the query is written interacts with internal details of the optimizer such that the expanded expression definition is not matched back to the computed column, and you end up with a seek that references columns A, B, and C instead of the computed column, Comp. This is the root cause.

Workaround

One idea to workaround this side-effect is to write the query as an apply manually:

SELECT
    T1.ID,
    T1.Comp,
    T1.D,
    CA.D2
FROM dbo.T AS T1
CROSS APPLY
(  
    SELECT TOP (1)
        D2 = T2.D
    FROM dbo.T AS T2
    WHERE
        T2.Comp = T1.Comp
        AND T2.D > T1.D
    ORDER BY
        T2.D ASC
) AS CA
WHERE
    T1.D IS NOT NULL -- DON'T CARE ABOUT INACTIVE RECORDS
ORDER BY
    T1.Comp;

Unfortunately, this query will not use the filtered index as we would hope either. The inequality test on column D inside the apply rejects NULLs, so the apparently redundant predicate WHERE T1.D IS NOT NULL is optimized away.

Without that explicit predicate, the filtered index matching logic decides it cannot use the filtered index. There are a number of ways to work around this second side-effect, but the easiest is probably to change the cross apply to an outer apply (mirroring the logic of the rewrite the optimizer performed earlier on the correlated subquery):

SELECT
    T1.ID,
    T1.Comp,
    T1.D,
    CA.D2
FROM dbo.T AS T1
OUTER APPLY
(  
    SELECT TOP (1)
        D2 = T2.D
    FROM dbo.T AS T2
    WHERE
        T2.Comp = T1.Comp
        AND T2.D > T1.D
    ORDER BY
        T2.D ASC
) AS CA
WHERE
    T1.D IS NOT NULL -- DON'T CARE ABOUT INACTIVE RECORDS
ORDER BY
    T1.Comp;

Now the optimizer does not need to use the apply rewrite itself (so the computed column matching works as expected) and the predicate is not optimized away either, so the filtered index can be used for both data access operations, and the seek uses the Comp column on both sides:

Outer Apply Plan

This would generally be preferred over adding A, B, and C as INCLUDEd columns in the filtered index, because it addresses the root cause of the problem, and does not require widening the index unnecessarily.

Persisted computed columns

As a side note, it is not necessary to mark the computed column as PERSISTED, if you don't mind repeating its definition in a CHECK constraint:

CREATE TABLE dbo.T 
(   
    ID integer IDENTITY(1, 1) NOT NULL,
    A varchar(20) NOT NULL,
    B varchar(20) NOT NULL,
    C varchar(20) NOT NULL,
    D date NULL,
    E varchar(20) NULL,
    Comp AS A + '-' + B + '-' + C,

    CONSTRAINT CK_T_Comp_NotNull
        CHECK (A + '-' + B + '-' + C IS NOT NULL),

    CONSTRAINT PK_T_ID 
        PRIMARY KEY (ID)
);

CREATE NONCLUSTERED INDEX IX_T_Comp_D
ON dbo.T (Comp, D) 
WHERE D IS NOT NULL;

The computed column is only required to be PERSISTED in this case if you want to use a NOT NULL constraint or to reference the Comp column directly (instead of repeating its definition) in a CHECK constraint.

Best Answer

Related Solutions

Sql-server – SQL Server : Transactional Replication Computed Column

Sql-server – Index on Persisted Computed column needs key lookup to get columns in the computed expression

Explanation

Workaround

Persisted computed columns

Related Question