Sql-server – Creating Non-Clustered Index on Non-Persisted Computed Column SQL Server

computed-columnindexsql serversql-server-2012

I am struggling to find any documentation on how SQL Server actually stores a non-persisted computed column.

Take the following example:

--SCHEMA
CREATE TABLE dbo.Invoice
(
    InvoiceID INT IDENTITY(1, 1) PRIMARY KEY,
    CustomerID INT FOREIGN KEY REFERENCES dbo.Customer(CustomerID),
    InvoiceStatus NVARCHAR(50) NOT NULL,
    InvoiceStatusID AS CASE InvoiceStatus 
                         WHEN 'Sent' THEN 1 
                         WHEN 'Complete' THEN 2
                         WHEN 'Received' THEN 3
                       END
)
GO

--INDEX
CREATE NONCLUSTERED INDEX IX_Invoice ON Invoice
(
    CustomerID ASC
)
INCLUDE
(
    InvoiceStatusID
)
GO

I get that it is stored at the leaf level, but if the value is not persisted how is anything stored at all? How does the index help SQL Server find these rows in this situation?

Any help greatly appreciated,

Many Thanks,

EDIT:

Thanks to Brent & Aaron for answering this, here's the PasteThePlan clearly showing what they explained.

Best Answer

When SQL Server creates the index on the computed field, the computed field is written to disk at that time - but only on the 8K pages of that index. SQL Server can compute the InvoiceStatusID as it reads through the clustered index - there's no need to write that data to the clustered index.

As you delete/update/insert rows in dbo.Invoice, the data in the indexes is kept up to date. (When InvoiceStatus changes, SQL Server knows to also update IX_Invoice.)

The best way you can see this for yourself is to actually do it: create these objects, and execute updates that touch the InvoiceStatusID field. Post the execution plan (PasteThePlan.com is helpful for this) if you want help seeing where the index updates are happening.

Related Solutions

Sql-server – computed columns, index, clustered index and covering index

Computed Columns can be stored in the data page in one of two ways. Either by creating them as PERSISTED or by including them in the clustered index definition.

If they are included in the clustered index definition then even if the columns are not marked as PERSISTED then the values will still be stored in each row. These index key values will additionally be stored in the upper level pages.

If the computed column is imprecise (e.g. float) or not verifiable as deterministic (e.g. CLR functions) then it is a requirement for the column to be marked as PERSISTED in order to be made part of an index key.

So to give an example

CREATE TABLE T
(
A INT,
C1 AS REPLICATE(CHAR(A),100) PERSISTED,
C2 AS REPLICATE(CHAR(A),200),
C3 AS CAST(A AS FLOAT) PERSISTED,
C4 AS CAST(A + 1 AS FLOAT)
)

CREATE UNIQUE CLUSTERED INDEX IX ON T(C2,C3)

C1 will be stored in just the data page rows as it is marked as PERSISTED but not indexed.
C2 will be stored in both the rows on the data page and the index higher levels as it is an index key column.
C3 will be stored as for C2. As it is imprecise it is a requirement to mark it as PERSISTED however.
C4 won't be stored anywhere as it is neither marked as PERSISTED nor indexed.

Similarly all computed columns referenced in non clustered index definitions as key columns need to be stored at all levels of the index as they are part of the index key. There is the same requirement regarding precise/deterministic results.

Fails

CREATE NONCLUSTERED INDEX IX2 ON T(A,C4)

With the error.

Cannot create index or statistics 'IX2' on table 'T' because the computed column 'C4' is imprecise and not persisted. Consider removing column from index or statistics key or marking computed column persisted.

To include it as part of the non clustered index key it must also be stored in the clustered index data pages. However

Succeeds.

CREATE NONCLUSTERED INDEX IX2 ON T(A) INCLUDE (C4)

Computed columns that are only INCLUDEd columns are persisted to the NCI leaf page and do not have the requirement that they also be persisted in the data page.

Sql-server – Index on Persisted Computed column needs key lookup to get columns in the computed expression

Why is a Key Lookup required to get A, B and C when they are not referenced in the query at all? I assume they are being used to calculate Comp, but why?

Columns A, B, and C are referenced in the query plan - they are used by the seek on T2.

Also, why can the query use the index on t2, but not on t1?

The optimizer decided that scanning the clustered index was cheaper than scanning the filtered nonclustered index and then performing a lookup to retrieve the values for columns A, B, and C.

Explanation

The real question is why the optimizer felt the need to retrieve A, B, and C for the index seek at all. We would expect it to read the Comp column using a nonclustered index scan, and then perform a seek on the same index (alias T2) to locate the Top 1 record.

The query optimizer expands computed column references before optimization begins, to give it a chance to assess the costs of various query plans. For some queries, expanding the definition of a computed column allows the optimizer to find more efficient plans.

When the optimizer encounters a correlated subquery, it attempts to 'unroll it' to a form it finds easier to reason about. If it cannot find a more effective simplification, it resorts to rewriting the correlated subquery as an apply (a correlated join):

Apply rewrite

It just so happens that this apply unrolling puts the logical query tree into a form that does not work well with project normalization (a later stage that looks to match general expressions to computed columns, among other things).

In your case, the way the query is written interacts with internal details of the optimizer such that the expanded expression definition is not matched back to the computed column, and you end up with a seek that references columns A, B, and C instead of the computed column, Comp. This is the root cause.

Workaround

One idea to workaround this side-effect is to write the query as an apply manually:

SELECT
    T1.ID,
    T1.Comp,
    T1.D,
    CA.D2
FROM dbo.T AS T1
CROSS APPLY
(  
    SELECT TOP (1)
        D2 = T2.D
    FROM dbo.T AS T2
    WHERE
        T2.Comp = T1.Comp
        AND T2.D > T1.D
    ORDER BY
        T2.D ASC
) AS CA
WHERE
    T1.D IS NOT NULL -- DON'T CARE ABOUT INACTIVE RECORDS
ORDER BY
    T1.Comp;

Unfortunately, this query will not use the filtered index as we would hope either. The inequality test on column D inside the apply rejects NULLs, so the apparently redundant predicate WHERE T1.D IS NOT NULL is optimized away.

Without that explicit predicate, the filtered index matching logic decides it cannot use the filtered index. There are a number of ways to work around this second side-effect, but the easiest is probably to change the cross apply to an outer apply (mirroring the logic of the rewrite the optimizer performed earlier on the correlated subquery):

SELECT
    T1.ID,
    T1.Comp,
    T1.D,
    CA.D2
FROM dbo.T AS T1
OUTER APPLY
(  
    SELECT TOP (1)
        D2 = T2.D
    FROM dbo.T AS T2
    WHERE
        T2.Comp = T1.Comp
        AND T2.D > T1.D
    ORDER BY
        T2.D ASC
) AS CA
WHERE
    T1.D IS NOT NULL -- DON'T CARE ABOUT INACTIVE RECORDS
ORDER BY
    T1.Comp;

Now the optimizer does not need to use the apply rewrite itself (so the computed column matching works as expected) and the predicate is not optimized away either, so the filtered index can be used for both data access operations, and the seek uses the Comp column on both sides:

Outer Apply Plan

This would generally be preferred over adding A, B, and C as INCLUDEd columns in the filtered index, because it addresses the root cause of the problem, and does not require widening the index unnecessarily.

Persisted computed columns

As a side note, it is not necessary to mark the computed column as PERSISTED, if you don't mind repeating its definition in a CHECK constraint:

CREATE TABLE dbo.T 
(   
    ID integer IDENTITY(1, 1) NOT NULL,
    A varchar(20) NOT NULL,
    B varchar(20) NOT NULL,
    C varchar(20) NOT NULL,
    D date NULL,
    E varchar(20) NULL,
    Comp AS A + '-' + B + '-' + C,

    CONSTRAINT CK_T_Comp_NotNull
        CHECK (A + '-' + B + '-' + C IS NOT NULL),

    CONSTRAINT PK_T_ID 
        PRIMARY KEY (ID)
);

CREATE NONCLUSTERED INDEX IX_T_Comp_D
ON dbo.T (Comp, D) 
WHERE D IS NOT NULL;

The computed column is only required to be PERSISTED in this case if you want to use a NOT NULL constraint or to reference the Comp column directly (instead of repeating its definition) in a CHECK constraint.