Sql-server – Creating additional unique keys on a partitioned table

database-designindexpartitioningsql serversql-server-2016

I am utilizing partition schemes in order to logically isolate data based on a TenantId column. TenantId is the partition column and clustering index across all tables that require tenant isolation. I would like to create unique, non-clustered indexes on the [Tenant] table that also holds TenantAlias and TenantName. I cannot create the unique indexes on the partition scheme unless they are a part of the partitioning column as a composite primary key. This fact leads me to worry about performance implications (especially in larger contexts), but this may be a misguided thought.

Even with the potential performance implications, given that the combination of TenantId, TenantAlias, and TenantName will always be a unique composition, I'm falling towards including the non-partitioning columns onto the clustered index. However, I would like to know if it's possible otherwise, if there are detriments to not including it in the composite key, or if it's better to avoid placing them on the partition scheme at all.

For optimal context, let's set up the security schema, file group, partition function, and partition scheme to a database called [IsolationExample].

IF (SCHEMA_ID('Auth') IS NULL)
BEGIN
    EXEC ('CREATE SCHEMA [Auth] AUTHORIZATION [dbo]')
END;
GO

ALTER DATABASE [IsolationExample]
ADD FILEGROUP [Auth]
GO

ALTER DATABASE [IsolationExample]
ADD FILE (
    NAME = N'IsolationExample_Auth', FILENAME = N'E:\MSSQL\Data\IsolationExample_Auth.ndf', SIZE = 512MB , MAXSIZE = UNLIMITED, FILEGROWTH = 512MB
    )
TO FILEGROUP [Auth]
GO

IF EXISTS (SELECT 1 FROM sys.partition_schemes WHERE [name] = 'PS_Tenant_Isolation') DROP PARTITION SCHEME [PS_Tenant_Isolation];
IF EXISTS (SELECT 1 FROM sys.partition_functions WHERE [name] = 'PF_Tenant_Isolation') DROP PARTITION FUNCTION [PF_Tenant_Isolation];

IF NOT EXISTS (SELECT 1 FROM sys.partition_functions WHERE [name] = 'PF_Tenant_Isolation')
BEGIN
CREATE PARTITION FUNCTION [PF_Tenant_Isolation] ([int])
    -- RANGE RIGHT values will be SPLIT based on external logic to add the correct values (1,2,3 ...) that corresponds with [Tenant].[TenantId].
    AS RANGE RIGHT FOR VALUES (0);
END;
GO

IF NOT EXISTS (SELECT 1 FROM sys.partition_schemes WHERE [name] = 'PS_Tenant_Isolation')
BEGIN
CREATE PARTITION SCHEME [PS_Tenant_Isolation]
    AS PARTITION [PF_Tenant_Isolation]
    ALL TO ([Auth]);
END;
GO

Option 1

The first attempted DDL operation does not work:

DROP TABLE IF EXISTS [Auth].[Tenant];
IF OBJECT_ID('[Auth].[Tenant]', 'U') IS NULL
BEGIN
    CREATE TABLE [Auth].[Tenant] (
        [TenantId] [int] IDENTITY(1,1) NOT NULL
        ,[TenantAlias] [varchar](3) NOT NULL
        ,[TenantName] [varchar](256) NOT NULL
        ,CONSTRAINT [PK_Tenant_TenantId] PRIMARY KEY CLUSTERED ([TenantId] ASC)
        ,CONSTRAINT [UQ_Tenant_TenantAlias] UNIQUE NONCLUSTERED ([TenantAlias] ASC)
        ,CONSTRAINT [UQ_Tenant_TenantName] UNIQUE NONCLUSTERED ([TenantName] ASC)
    ) ON [PS_Tenant_Isolation]([TenantId]);
END;
GO

Column 'TenantId' is partitioning column of the index 'UQ_Tenant_TenantName'. Partition columns for a unique index must be a subset of the index key.

Option 2

If I place the DDL operations outside of the scope of the CREATE TABLE operation, it also tosses the same error.

DROP TABLE IF EXISTS [Auth].[Tenant];
IF OBJECT_ID('[Auth].[Tenant]', 'U') IS NULL
BEGIN
    CREATE TABLE [Auth].[Tenant] (
        [TenantId] [int] IDENTITY(1,1) NOT NULL
        ,[TenantAlias] [varchar](3) NOT NULL
        ,[TenantName] [varchar](256) NOT NULL
        ,CONSTRAINT [PK_Tenant_TenantId] PRIMARY KEY CLUSTERED ([TenantId] ASC)
    ) ON [PS_Tenant_Isolation]([TenantId]);
    CREATE UNIQUE NONCLUSTERED INDEX [UQ_Tenant_TenantAlias] ON [Auth].[Tenant]([TenantAlias] ASC) ON [PS_Tenant_Isolation]([TenantId]);
    CREATE UNIQUE NONCLUSTERED INDEX [UQ_Tenant_TenantName] ON [Auth].[Tenant]([TenantName] ASC) ON [PS_Tenant_Isolation]([TenantId]);
END;
GO

Column 'TenantId' is partitioning column of the index 'UQ_Tenant_TenantName'. Partition columns for a unique index must be a subset of the index key.

*Option 3

If I do not place the unique indexes on the partition scheme, the DDL operation succeeds, but as one would expect it places the indexes on the specified file group directly. In my mind, this defeats the entire purpose of the partition scheme as we cannot leverage the performance benefits of the partitioning column. But I may be wrong on this and would like to be corrected if so.

DROP TABLE IF EXISTS [Auth].[Tenant];
IF OBJECT_ID('[Auth].[Tenant]', 'U') IS NULL
BEGIN
    CREATE TABLE [Auth].[Tenant] (
        [TenantId] [int] IDENTITY(1,1) NOT NULL
        ,[TenantAlias] [varchar](3) NOT NULL
        ,[TenantName] [varchar](256) NOT NULL
        ,CONSTRAINT [PK_Tenant_TenantId] PRIMARY KEY CLUSTERED ([TenantId] ASC)
    ) ON [PS_Tenant_Isolation]([TenantId]);
    CREATE UNIQUE NONCLUSTERED INDEX [UQ_Tenant_TenantAlias] ON [Auth].[Tenant]([TenantAlias] ASC) ON [Auth];
    CREATE UNIQUE NONCLUSTERED INDEX [UQ_Tenant_TenantName] ON [Auth].[Tenant]([TenantName] ASC) ON [Auth];
END;
GO

Tenant Isolation - Option 3

Option 4 – Composite Key

Lastly, if I include them as a composite key, the DDL operation works and places them on the partition scheme. This is the "desired" end behavior where the indexes are on the partition scheme, but I want to know if it is appropriate to include them as a composite key on the clustered index at all.

DROP TABLE IF EXISTS [Auth].[Tenant];
IF OBJECT_ID('[Auth].[Tenant]', 'U') IS NULL
BEGIN
    CREATE TABLE [Auth].[Tenant] (
        [TenantId] [int] IDENTITY(1,1) NOT NULL
        ,[TenantAlias] [varchar](3) NOT NULL
        ,[TenantName] [varchar](256) NOT NULL
        ,CONSTRAINT [PK_Tenant_TenantId] PRIMARY KEY CLUSTERED ([TenantId] ASC, [TenantAlias] ASC, [TenantName] ASC)
    ) ON [PS_Tenant_Isolation]([TenantId]);
END;
GO

Am I better served with option 3 or option 4, or perhaps an option 5 that I am unaware of?

I would also like to think outside of the context of purely this [Tenant] table and carry the same general understanding onto other tables that require additional unique keys as well. For example, a [Document] table with potentially billions of rows, where in addition to TenantId there is a bigint identity column of DocumentId as well as a unique DocumentBatchName.

Best Answer

The question for me becomes more about what are you trying to make unique, versus what are you trying to partition.

In option 4, adding the alias and name to the primary key puts you in an akward situation where only the ID, Alias, Name tuple is actually unique:

CREATE TABLE [Auth].[Tenant] (
    [TenantId] [int] IDENTITY(1,1) NOT NULL
    ,[TenantAlias] [varchar](3) NOT NULL
    ,[TenantName] [varchar](256) NOT NULL
    ,[Other] INTEGER
    ,CONSTRAINT [PK_Tenant_TenantId] PRIMARY KEY CLUSTERED ([TenantId] ASC, [TenantAlias] ASC, [TenantName] ASC)
) ON [PS_Tenant_Isolation]([TenantId]);

INSERT  INTO Auth.Tenant ( TenantAlias, TenantName )
VALUES  ( 'A', 'Apple' ),
        ( 'A', 'Apple' ),
        ( 'B', 'Banana' );

SELECT  *
FROM    Auth.Tenant 

In a scenario where the Other column above represents a bunch of columns containing tenant data, you'd be duplicating a ton of entirely unnecessary data in the clustered key, of all places, with TenantName being part of that index. Moaning about size issues aside, you've also got a much bigger problem, going about it this way:

UPDATE  Auth.Tenant
    SET TenantName = 'Orange'
WHERE   TenantID = 2;

SELECT  *
FROM    Auth.Tenant 

DROP TABLE Auth.Tenant;

It's really just a very weak model at this point.

In order to address the uniqueness requirement and gain the benefits of partitioning as well, you're really looking at doing your UNIQUE work on the Auth filegroup, then partitioning your data via the PS_Tenant_Isolation function separately.

IF ( OBJECT_ID( 'Auth.Tenant', 'U' ) IS NULL )
BEGIN
    --DROP TABLE Auth.Tenant;
    CREATE TABLE Auth.Tenant
    (
        TenantID                INTEGER IDENTITY( 1, 1 ) NOT NULL,
        TenantAlias             VARCHAR( 3 ) NOT NULL,
        TenantName              VARCHAR( 256 ) NOT NULL
    );

    ALTER TABLE Auth.Tenant
    ADD CONSTRAINT PK__Tenant
        PRIMARY KEY CLUSTERED ( TenantID )
    ON  [Auth];

    ALTER TABLE Auth.Tenant
    ADD CONSTRAINT UQ__Tenant__TenantAlias
        UNIQUE ( TenantAlias )
    ON  [Auth];

    ALTER TABLE Auth.Tenant
    ADD CONSTRAINT UQ__Tenant_TenantName
        UNIQUE ( TenantName )
    ON  [Auth];

    INSERT  INTO Auth.Tenant ( TenantAlias, TenantName )
    VALUES  ( 'A', 'Apple' ),
            ( 'B', 'Banana' );
END;
GO

IF ( OBJECT_ID( 'Auth.TenantData', 'U' ) IS NULL )
BEGIN
    --DROP TABLE Auth.TenantData;
    CREATE TABLE Auth.TenantData
    (
        TenantDataID            INTEGER IDENTITY( 1, 1 ) NOT NULL,
        TenantID                INTEGER NOT NULL,
        Other                   BIT
    );

    ALTER TABLE Auth.TenantData
    ADD CONSTRAINT PK__TenantData
        PRIMARY KEY CLUSTERED ( TenantID, TenantDataID )
    ON  [PS_Tenant_Isolation]( TenantID );

    ALTER TABLE Auth.TenantData
    ADD CONSTRAINT FK__TenantData__Tenant
        FOREIGN KEY ( TenantID )
        REFERENCES Auth.Tenant ( TenantID );

    INSERT  INTO Auth.TenantData ( TenantID, Other )
    VALUES  ( 1, 1 ),
            ( 2, 0 );
END;
GO

SELECT  *
FROM    Auth.Tenant;

SELECT  *
FROM    Auth.TenantData;

SELECT  ObjectName = OBJECT_NAME( ps.object_id ), 
        IndexName = i.name,
        DataSpaceName = ds.name,
        ps.partition_number, ps.row_count   
FROM    sys.dm_db_partition_stats ps
INNER JOIN sys.indexes i
    ON  ps.object_id = i.object_id
    AND ps.index_id = i.index_id
INNER JOIN sys.data_spaces ds
    ON  i.data_space_id = ds.data_space_id
WHERE   ps.object_id IN (   SELECT  OBJECT_ID( 'Auth.Tenant' )
                            UNION ALL
                            SELECT  OBJECT_ID( 'Auth.TenantData' ) );

DROP TABLE Auth.TenantData;
DROP TABLE Auth.Tenant;

Now your smallish "lookup" table for your Tenants can have properly unique indexes, and the data you are trying to partition for those tenants can benefit from the partitioning schema.

Edit: As per the comments, I'll add that non-aligned indexes do not necessarily lose all the benefits of a partitioning schema, as RID lookups can most certainly take advantage of knowing what partition they're in.