Sql-server – Any way around unique index 16 column max

constraintdatabase-designsql serversql-server-2008

According to the CREATE INDEX documentation:

Up to 16 columns can be combined into a single composite index key.

We've got a table with ~18 columns that need to form a unique combination. This table is not performance sensitive — we rarely update values/insert records. We just need to ensure that we avoid duplicating our records… and thought we could impose a simple uniqueness constraint.

Any ideas? I'm open to avoiding the unique index/constraint entirely if there is a better way.

Best Answer

Add a persisted computed column that combines the 18 keys, then create an unique index on the computed column:

alter table t add all_keys as c1+c2+c3+...+c18 persisted;
create unique index i18 on t (all_keys);

See Creating Indexes on Computed Columns.

Another approach is to create an indexed view:

create view v 
with schemabinding
as select c1+c2+c3+...+c18 as all_keys
from dbo.t;

create unique clustered index c18 on v(all_keys);

See Creating Indexed Views.

Both approaches allow for a partial key aggregate: aggregate c1+c2+c3 as k1, c4+c5+c6 as k2 etc. then index/create indexed view on (k1, k2, ...). Thia could be beneficial for range scans (index can be used for search on c1+c2+c3.

Of course, all + operation in my example are string aggregation, the actual operator to use depends on the types of all those columns (ie. you may have to use explicit casts).

PS. As unique constraints are enforced by an unique index, any restriction on unique indexes will apply to unique constraints as well:

create table t (
    c1 char(3), c2 char(3), c3 char(3), c4 char(3),
    c5 char(3), c6 char(3), c7 char(3), c8 char(3),
    c9 char(3), c10 char(3), c11 char(3), c12 char(3),
    c13 char(3), c14 char(3), c15 char(3), c16 char(3),
    c17 char(3), c18 char(3), c19 char(3), c20 char(3),
    constraint unq unique
      (c1,c2,c3,c4,c5,c6,c7,c8,c9,c10,c11,c12,c13,c14,c15,c16,c17,c18));
go  


Msg 1904, Level 16, State 1, Line 3
The index '' on table 't' has 18 column names in index key list. 
The maximum limit for index or statistics key column list is 16.
Msg 1750, Level 16, State 0, Line 3
Could not create constraint. See previous errors.

However, creating the constraint on a persisted computed column works:

create table t (
    c1 char(3), c2 char(3), c3 char(3), c4 char(3),
    c5 char(3), c6 char(3), c7 char(3), c8 char(3),
    c9 char(3), c10 char(3), c11 char(3), c12 char(3),
    c13 char(3), c14 char(3), c15 char(3), c16 char(3),
    c17 char(3), c18 char(3), c19 char(3), c20 char(3),
    all_c as 
        c1+c2+c3+c4+c5+c6+c7+c8+c9+c10+c11+
        c12+c13+c14+c15+c16+c17+c18 
        persisted
        constraint unq unique (all_c));
go

Obviously, the persisted column consumes the space on disk so the approach may be bad for a very large table. The indexed view approach does not have this problem, it only consumes the space for the index, not the space for the computed column and index.

Explanation

The real question is why the optimizer felt the need to retrieve A, B, and C for the index seek at all. We would expect it to read the Comp column using a nonclustered index scan, and then perform a seek on the same index (alias T2) to locate the Top 1 record.

The query optimizer expands computed column references before optimization begins, to give it a chance to assess the costs of various query plans. For some queries, expanding the definition of a computed column allows the optimizer to find more efficient plans.

When the optimizer encounters a correlated subquery, it attempts to 'unroll it' to a form it finds easier to reason about. If it cannot find a more effective simplification, it resorts to rewriting the correlated subquery as an apply (a correlated join):

Apply rewrite

It just so happens that this apply unrolling puts the logical query tree into a form that does not work well with project normalization (a later stage that looks to match general expressions to computed columns, among other things).

In your case, the way the query is written interacts with internal details of the optimizer such that the expanded expression definition is not matched back to the computed column, and you end up with a seek that references columns A, B, and C instead of the computed column, Comp. This is the root cause.

Workaround

One idea to workaround this side-effect is to write the query as an apply manually:

SELECT
    T1.ID,
    T1.Comp,
    T1.D,
    CA.D2
FROM dbo.T AS T1
CROSS APPLY
(  
    SELECT TOP (1)
        D2 = T2.D
    FROM dbo.T AS T2
    WHERE
        T2.Comp = T1.Comp
        AND T2.D > T1.D
    ORDER BY
        T2.D ASC
) AS CA
WHERE
    T1.D IS NOT NULL -- DON'T CARE ABOUT INACTIVE RECORDS
ORDER BY
    T1.Comp;

Unfortunately, this query will not use the filtered index as we would hope either. The inequality test on column D inside the apply rejects NULLs, so the apparently redundant predicate WHERE T1.D IS NOT NULL is optimized away.

Without that explicit predicate, the filtered index matching logic decides it cannot use the filtered index. There are a number of ways to work around this second side-effect, but the easiest is probably to change the cross apply to an outer apply (mirroring the logic of the rewrite the optimizer performed earlier on the correlated subquery):

SELECT
    T1.ID,
    T1.Comp,
    T1.D,
    CA.D2
FROM dbo.T AS T1
OUTER APPLY
(  
    SELECT TOP (1)
        D2 = T2.D
    FROM dbo.T AS T2
    WHERE
        T2.Comp = T1.Comp
        AND T2.D > T1.D
    ORDER BY
        T2.D ASC
) AS CA
WHERE
    T1.D IS NOT NULL -- DON'T CARE ABOUT INACTIVE RECORDS
ORDER BY
    T1.Comp;

Now the optimizer does not need to use the apply rewrite itself (so the computed column matching works as expected) and the predicate is not optimized away either, so the filtered index can be used for both data access operations, and the seek uses the Comp column on both sides:

Outer Apply Plan

This would generally be preferred over adding A, B, and C as INCLUDEd columns in the filtered index, because it addresses the root cause of the problem, and does not require widening the index unnecessarily.

Persisted computed columns

As a side note, it is not necessary to mark the computed column as PERSISTED, if you don't mind repeating its definition in a CHECK constraint:

CREATE TABLE dbo.T 
(   
    ID integer IDENTITY(1, 1) NOT NULL,
    A varchar(20) NOT NULL,
    B varchar(20) NOT NULL,
    C varchar(20) NOT NULL,
    D date NULL,
    E varchar(20) NULL,
    Comp AS A + '-' + B + '-' + C,

    CONSTRAINT CK_T_Comp_NotNull
        CHECK (A + '-' + B + '-' + C IS NOT NULL),

    CONSTRAINT PK_T_ID 
        PRIMARY KEY (ID)
);

CREATE NONCLUSTERED INDEX IX_T_Comp_D
ON dbo.T (Comp, D) 
WHERE D IS NOT NULL;

The computed column is only required to be PERSISTED in this case if you want to use a NOT NULL constraint or to reference the Comp column directly (instead of repeating its definition) in a CHECK constraint.

Best Answer

Related Solutions

Best Type for Composite Key with Varying Value Types in SQL Server

Sql-server – Index on Persisted Computed column needs key lookup to get columns in the computed expression

Explanation

Workaround

Persisted computed columns

Related Question