Sql-server – Index on a persisted computed column not seekable

computed-columnindexoptimizationsql serversql-server-2012

I have table, called Address, that has a persisted computed column called Hashkey. The column is deterministic but not precise. It has a unique index on it that is not seekable. If I run this query, returning the primary key:

SELECT @ADDRESSID= ISNULL(AddressId,0)
FROM dbo.[Address]
WHERE HashKey = @HashKey

I get this plan:

BasicPlan

If I force the index I get this even worse plan:

ForceIndex

If I try and force both the index and a seek, I get an error:

Query processor could not produce a query plan because of the hints defined in this query. Resubmit the query without specifying any hints and without using SET FORCEPLAN

Is this just because it's not precise? I thought that didn't matter if it was persisted?

Is there a way to make this index seekable without making this a non-computed column?

Does anyone have any links to information on this?

I can't post the actual table creation, but here is a test table that has the same issue:

drop TABLE [dbo].[Test]

CREATE TABLE [dbo].[Test]
  (
     [test]        [VARCHAR](100) NULL,
     [TestGeocode] [geography] NULL,
     [Hashkey] AS CAST(
                        ( hashbytes
                            ('SHA', 
                                ( RIGHT(REPLICATE(' ', (100)) + isnull([test], ''), ( 100 )) ) 
                                + RIGHT(REPLICATE(' ', (100)) + isnull([TestGeocode].[ToString](), ''), ( 100 ))
                            ) 
                        ) AS BINARY(20)                                                                                                        
                      ) PERSISTED
    CONSTRAINT [UK_Test_HashKey] UNIQUE NONCLUSTERED([Hashkey])
  )    
GO    
DECLARE @Hashkey BINARY(20)

SELECT [Hashkey]
FROM   [dbo].[Test] WITH (FORCESEEK) /*Query processor could not produce a query plan*/
WHERE  [Hashkey] = @Hashkey

Best Answer

The problem seems to be related to the fact that [TestGeocode].[ToString]() returns a max datatype (nvarchar(max)).

I also encounter the issue with this simpler version (changing the definition of c1 to varchar(8000) or using COALESCE instead of ISNULL resolves it)

DROP TABLE dbo.Test

CREATE TABLE dbo.Test
  (
     c1        VARCHAR(
                          MAX    --Fails
                        --  8000 --Works fine
                          ) NULL,
     comp1 AS CAST(ISNULL(c1, 'ABC') AS VARCHAR(100))
    CONSTRAINT UK_Test_comp1 UNIQUE NONCLUSTERED(comp1)
  )

GO

DECLARE @comp1 VARCHAR(100)

SELECT comp1
FROM   dbo.Test WITH (FORCESEEK)
WHERE  comp1 = @comp1 
OPTION (QUERYTRACEON 3604, QUERYTRACEON 8606);

Computed column references get expanded out to the underlying definition then matched back to the column later. This allows computed columns to be matched without referencing them by name at all and also allows simplification to operate on the underlying definitions.

ISNULL returns the datatype of the first parameter (VARCHAR(MAX) in my example). The return type of COALESCE will be VARCHAR(MAX) here too but it seems to be evaluated differently in a way that avoids the problem.

In the cases where the query succeeds the trace flag output includes the following

ScaOp_Convert varchar(max) collate 49160,Null,Var,Trim,ML=65535

    ScaOp_Const TI(varchar collate 49160,Var,Trim,ML=3) 
                      XVAR(varchar,Owned,Value=Len,Data = (3,ABC))

Where it fails this is replaced by

ScaOp_Identifier COL: ConstExpr1003

I speculate that in the cases where it fails the (implicit) CAST('ABC' AS VARCHAR(MAX)) is just done once and this is evaluated as a runtime constant (more information). However the reference to this runtime constant label, instead of the actual string literal value itself, prevents it from matching the computed column definition.

This rewrite avoids the issue in your query

CREATE TABLE [dbo].[Test]
  (
     [test]        [VARCHAR](100) NULL,
     [TestGeocode] [geography] NULL,
     [Hashkey] AS CAST(
                        ( hashbytes
                            ('SHA', 
                                ( RIGHT(SPACE(100) + isnull([test], ''), 100) ) 
                                + RIGHT(SPACE(100) + isnull(CAST(RIGHT([TestGeocode].[ToString](),100) AS VARCHAR(100)), ''),100)
                            ) 
                        ) AS BINARY(20)                                                                                                        
                      ) PERSISTED
    CONSTRAINT [UK_Test_HashKey] UNIQUE NONCLUSTERED([Hashkey])
  )

Explanation

The real question is why the optimizer felt the need to retrieve A, B, and C for the index seek at all. We would expect it to read the Comp column using a nonclustered index scan, and then perform a seek on the same index (alias T2) to locate the Top 1 record.

The query optimizer expands computed column references before optimization begins, to give it a chance to assess the costs of various query plans. For some queries, expanding the definition of a computed column allows the optimizer to find more efficient plans.

When the optimizer encounters a correlated subquery, it attempts to 'unroll it' to a form it finds easier to reason about. If it cannot find a more effective simplification, it resorts to rewriting the correlated subquery as an apply (a correlated join):

Apply rewrite

It just so happens that this apply unrolling puts the logical query tree into a form that does not work well with project normalization (a later stage that looks to match general expressions to computed columns, among other things).

In your case, the way the query is written interacts with internal details of the optimizer such that the expanded expression definition is not matched back to the computed column, and you end up with a seek that references columns A, B, and C instead of the computed column, Comp. This is the root cause.

Workaround

One idea to workaround this side-effect is to write the query as an apply manually:

SELECT
    T1.ID,
    T1.Comp,
    T1.D,
    CA.D2
FROM dbo.T AS T1
CROSS APPLY
(  
    SELECT TOP (1)
        D2 = T2.D
    FROM dbo.T AS T2
    WHERE
        T2.Comp = T1.Comp
        AND T2.D > T1.D
    ORDER BY
        T2.D ASC
) AS CA
WHERE
    T1.D IS NOT NULL -- DON'T CARE ABOUT INACTIVE RECORDS
ORDER BY
    T1.Comp;

Unfortunately, this query will not use the filtered index as we would hope either. The inequality test on column D inside the apply rejects NULLs, so the apparently redundant predicate WHERE T1.D IS NOT NULL is optimized away.

Without that explicit predicate, the filtered index matching logic decides it cannot use the filtered index. There are a number of ways to work around this second side-effect, but the easiest is probably to change the cross apply to an outer apply (mirroring the logic of the rewrite the optimizer performed earlier on the correlated subquery):

SELECT
    T1.ID,
    T1.Comp,
    T1.D,
    CA.D2
FROM dbo.T AS T1
OUTER APPLY
(  
    SELECT TOP (1)
        D2 = T2.D
    FROM dbo.T AS T2
    WHERE
        T2.Comp = T1.Comp
        AND T2.D > T1.D
    ORDER BY
        T2.D ASC
) AS CA
WHERE
    T1.D IS NOT NULL -- DON'T CARE ABOUT INACTIVE RECORDS
ORDER BY
    T1.Comp;

Now the optimizer does not need to use the apply rewrite itself (so the computed column matching works as expected) and the predicate is not optimized away either, so the filtered index can be used for both data access operations, and the seek uses the Comp column on both sides:

Outer Apply Plan

This would generally be preferred over adding A, B, and C as INCLUDEd columns in the filtered index, because it addresses the root cause of the problem, and does not require widening the index unnecessarily.

Persisted computed columns

As a side note, it is not necessary to mark the computed column as PERSISTED, if you don't mind repeating its definition in a CHECK constraint:

CREATE TABLE dbo.T 
(   
    ID integer IDENTITY(1, 1) NOT NULL,
    A varchar(20) NOT NULL,
    B varchar(20) NOT NULL,
    C varchar(20) NOT NULL,
    D date NULL,
    E varchar(20) NULL,
    Comp AS A + '-' + B + '-' + C,

    CONSTRAINT CK_T_Comp_NotNull
        CHECK (A + '-' + B + '-' + C IS NOT NULL),

    CONSTRAINT PK_T_ID 
        PRIMARY KEY (ID)
);

CREATE NONCLUSTERED INDEX IX_T_Comp_D
ON dbo.T (Comp, D) 
WHERE D IS NOT NULL;

The computed column is only required to be PERSISTED in this case if you want to use a NOT NULL constraint or to reference the Comp column directly (instead of repeating its definition) in a CHECK constraint.

Sql-server – Filtered index not chosen, and rejected with hint

I assumed it would follow my hint, and maybe error out at execution time if I wound up with some bad data and the index was missing some needed values.

The query optimizer will only use a filtered index in a query plan if it can guarantee (within its reasoning framework) that all possible matches can be served from the index. This is by design, to avoid the sort of runtime error you describe.

Failure to results in a NESTED LOOPS JOIN from my non-clustered index against a clustered index Key Lookup, presumably to grab the parentId. INCLUDING parent ID eliminates this, and leaves me with a nice non-clustered index scan.

This is a known current limitation. Adding the filtered column(s) to the key or include list is the standard workaround, and a current best practice for all sorts of semi-related reasons.

The FORCE ORDER, MERGE JOIN is definitely needed though.

Be extremely careful using hints (directives) like this unless you fully understand all the consequences. FORCE ORDER in particular is an extremely powerful and wide-ranging hint, with a number of non-obvious side-effects including the placement of aggregate operators, and the order of evaluation of subqueries and common table expressions.

For the most part, you should try to write queries that provide the query optimizer with enough good-quality information to make the right decisions without hints. The hinted plan may be 'optimal' today, but it may not remain so as the data volume and/or distribution changes over time.

Best Answer

Related Solutions

Sql-server – Index on Persisted Computed column needs key lookup to get columns in the computed expression

Explanation

Workaround

Persisted computed columns

Sql-server – Filtered index not chosen, and rejected with hint

Related Question