Sql-server – Non-clustered indexes for columns defined within a composite Primary Key

database-designindexperformancequery-performancesql serversql-server-2016

I have a multi-tenant database where I am utilizing an AccountId column within all tables as a part of the composite Primary Key for tenant isolation. Is it beneficial to create additional non-clustered indexes on each column that is part of the composite Primary Key to assist SQL Server with maintaining accurate statistics and improving query performance when joining to lookup tables?

For instance, within an associative table that defines one-to-many relationships between an Account and the American State in which they have offices, in theory which of the two options is preferable given the following structure and sample query?

Create the Account and State tables and populate with sample data.

DROP TABLE IF EXISTS [dbo].[Account];
DROP TABLE IF EXISTS [dbo].[State];

-- [Account] table and sample values.
IF OBJECT_ID('[dbo].[Account]', 'U') IS NULL
BEGIN
    CREATE TABLE [dbo].[Account] (
        [AccountId] [int] IDENTITY(1,1) NOT NULL
        ,[AccountAlias] [varchar](3) NOT NULL
        ,[AccountName] [varchar](128) NOT NULL
        ,CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED ([AccountId] ASC)
        ,CONSTRAINT [UQ_Account_Alias] UNIQUE NONCLUSTERED ([AccountAlias] ASC)
        ,CONSTRAINT [UQ_Account_Name] UNIQUE NONCLUSTERED ([AccountName] ASC)
    );

    SET IDENTITY_INSERT [dbo].[Account] ON;

    INSERT INTO [dbo].[Account] ([AccountId], [AccountAlias], [AccountName])
    VALUES (1, 'SA1', 'Sample Account 1'), (2, 'SA2', 'Sample Account 2'), (3, 'SA3', 'Sample Account 3')

    SET IDENTITY_INSERT [dbo].[Account] OFF;
END;
GO

-- [State] table and sample values.
IF OBJECT_ID('[dbo].[State]', 'U') IS NULL
BEGIN
    CREATE TABLE [dbo].[State] (
        [StateId] [tinyint] IDENTITY(1,1) NOT NULL
        ,[StateCode] [varchar](2) NOT NULL
        ,[StateName] [varchar](32) NOT NULL
        ,CONSTRAINT [PK_State] PRIMARY KEY CLUSTERED ([StateId] ASC)
        ,CONSTRAINT [UQ_State_Code] UNIQUE NONCLUSTERED ([StateCode] ASC)
        ,CONSTRAINT [UQ_State_Name] UNIQUE NONCLUSTERED ([StateName] ASC)
    );

    SET IDENTITY_INSERT [dbo].[State] ON;

    INSERT INTO [dbo].[State] ([StateId], [StateCode], [StateName])
    VALUES (1, 'AL', 'Alabama'), (2, 'AK', 'Alaska'), (3, 'AZ', 'Arizona'), (4, 'AR', 'Arkansas'), (5, 'CA', 'California')

    SET IDENTITY_INSERT [dbo].[State] OFF;
END;
GO

Create AccountState OPTION 1 – composite Primary Key only

DROP TABLE IF EXISTS [dbo].[AccountState];

IF OBJECT_ID('[dbo].[AccountState]', 'U') IS NULL
BEGIN
    CREATE TABLE [dbo].[AccountState] (
        [AccountId] [int] NOT NULL
        ,[StateId] [tinyint] NOT NULL
        ,CONSTRAINT [PK_AccountState] PRIMARY KEY CLUSTERED ([AccountId] ASC, [StateId] ASC)
        ,CONSTRAINT [FK_AccountState_Account] FOREIGN KEY ([AccountId]) REFERENCES [dbo].[Account]([AccountId])
        ,CONSTRAINT [FK_AccountState_State] FOREIGN KEY ([StateId]) REFERENCES [dbo].[State]([StateId])
    );

    INSERT INTO [dbo].[AccountState] ([AccountId], [StateId])
    SELECT A.[AccountId], S.[StateId]
    FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
    WHERE A.[AccountId] = 1 AND S.[StateId] IN (1, 2, 3)
    UNION
    SELECT A.[AccountId], S.[StateId]
    FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
    WHERE A.[AccountId] = 2 AND S.[StateId] IN (3, 4, 5)
    UNION
    SELECT A.[AccountId], S.[StateId]
    FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
    WHERE A.[AccountId] = 3 AND S.[StateId] IN (1, 3, 5)
END;
GO

Create AccountState OPTION 2 – composite Primary Key + non-clustered indexes

DROP TABLE IF EXISTS [dbo].[AccountState];

IF OBJECT_ID('[dbo].[AccountState]', 'U') IS NULL
BEGIN
    CREATE TABLE [dbo].[AccountState] (
        [AccountId] [int] NOT NULL
        ,[StateId] [tinyint] NOT NULL
        ,CONSTRAINT [PK_AccountState] PRIMARY KEY CLUSTERED ([AccountId] ASC, [StateId] ASC)
        ,CONSTRAINT [FK_AccountState_Account] FOREIGN KEY ([AccountId]) REFERENCES [dbo].[Account]([AccountId])
        ,CONSTRAINT [FK_AccountState_State] FOREIGN KEY ([StateId]) REFERENCES [dbo].[State]([StateId])
        ,INDEX [IX_AccountState_Account] NONCLUSTERED ([AccountId])
        ,INDEX [IX_AccountState_State] NONCLUSTERED ([StateId])
    );

    INSERT INTO [dbo].[AccountState] ([AccountId], [StateId])
    SELECT A.[AccountId], S.[StateId]
    FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
    WHERE A.[AccountId] = 1 AND S.[StateId] IN (1, 2, 3)
    UNION
    SELECT A.[AccountId], S.[StateId]
    FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
    WHERE A.[AccountId] = 2 AND S.[StateId] IN (3, 4, 5)
    UNION
    SELECT A.[AccountId], S.[StateId]
    FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
    WHERE A.[AccountId] = 3 AND S.[StateId] IN (1, 3, 5)
END;
GO

Sample query

SELECT
    A.[AccountName]
    ,S.[StateName]
FROM
    [dbo].[AccountState] A_S
    JOIN [dbo].[Account] A
        ON A_S.[AccountId] = A.[AccountId]
    JOIN [dbo].[State] S
        ON A_S.[StateId] = S.[StateId]
WHERE
    S.[StateCode] = 'CA'

Of the two options, what type of index combination would be seen as most suitable to scale? Composite primary key only or composite primary key plus additional non-clustered indexes? Or is there another option that is more viable?

Best Answer

I'm not sure that the sample query is a fair example to base a recommendation on. The sample query is not a typical multi-tenant application query since it is not specific to a particular client. It is more of a support or management query looking to gain insight about all (or at least multiple) customers. Of course, it could also be maintenance related (e.g. garbage collection would look for oldest dates and not care about AccountId). So let's separate this out:

  1. General

    I don't see any benefit to ever having a non-clustered index that is solely the leading / left-most key column in the clustered index. The clustered index is already in that order and so statistics exist for it. So, the IX_AccountState_Account index in Option 2 is purely waste, and a drag on the system.

  2. Support / Management / Maintenance queries

    These queries, especially the maintenance ones, can work across AccountId values. As such, some queries will certainly benefit from non-clustered indexes on clustered index key columns that are not AccountId (or to put it more generally: that are not the left-most / leading key column). This assumes that you have queries that filter / sort on only the entity ID. Not having any queries like that means you probably don't need this index.

  3. Application queries

    These queries should always include AccountId, so I don't see how they would benefit from an index that is solely on the entity ID.

    While I have worked on multi-tenant systems similar to what you are describing, I just thought of something that never occurred to me before: since the clustered index statistics are based on the left-most / leading column and not the combination of key columns, it might be beneficial to manually create statistics for those objects. I seem to recall that it is just the density portion that will account for all key columns (not just the leading column), but that might help the query optimizing. This needs to be tested as I have not tried it (and I won't be able to do such a test anytime soon).