I have a multi-tenant database where I am utilizing an AccountId
column within all tables as a part of the composite Primary Key for tenant isolation. Is it beneficial to create additional non-clustered indexes on each column that is part of the composite Primary Key to assist SQL Server with maintaining accurate statistics and improving query performance when joining to lookup tables?
For instance, within an associative table that defines one-to-many relationships between an Account
and the American State
in which they have offices, in theory which of the two options is preferable given the following structure and sample query?
Create the Account
and State
tables and populate with sample data.
DROP TABLE IF EXISTS [dbo].[Account];
DROP TABLE IF EXISTS [dbo].[State];
-- [Account] table and sample values.
IF OBJECT_ID('[dbo].[Account]', 'U') IS NULL
BEGIN
CREATE TABLE [dbo].[Account] (
[AccountId] [int] IDENTITY(1,1) NOT NULL
,[AccountAlias] [varchar](3) NOT NULL
,[AccountName] [varchar](128) NOT NULL
,CONSTRAINT [PK_Account] PRIMARY KEY CLUSTERED ([AccountId] ASC)
,CONSTRAINT [UQ_Account_Alias] UNIQUE NONCLUSTERED ([AccountAlias] ASC)
,CONSTRAINT [UQ_Account_Name] UNIQUE NONCLUSTERED ([AccountName] ASC)
);
SET IDENTITY_INSERT [dbo].[Account] ON;
INSERT INTO [dbo].[Account] ([AccountId], [AccountAlias], [AccountName])
VALUES (1, 'SA1', 'Sample Account 1'), (2, 'SA2', 'Sample Account 2'), (3, 'SA3', 'Sample Account 3')
SET IDENTITY_INSERT [dbo].[Account] OFF;
END;
GO
-- [State] table and sample values.
IF OBJECT_ID('[dbo].[State]', 'U') IS NULL
BEGIN
CREATE TABLE [dbo].[State] (
[StateId] [tinyint] IDENTITY(1,1) NOT NULL
,[StateCode] [varchar](2) NOT NULL
,[StateName] [varchar](32) NOT NULL
,CONSTRAINT [PK_State] PRIMARY KEY CLUSTERED ([StateId] ASC)
,CONSTRAINT [UQ_State_Code] UNIQUE NONCLUSTERED ([StateCode] ASC)
,CONSTRAINT [UQ_State_Name] UNIQUE NONCLUSTERED ([StateName] ASC)
);
SET IDENTITY_INSERT [dbo].[State] ON;
INSERT INTO [dbo].[State] ([StateId], [StateCode], [StateName])
VALUES (1, 'AL', 'Alabama'), (2, 'AK', 'Alaska'), (3, 'AZ', 'Arizona'), (4, 'AR', 'Arkansas'), (5, 'CA', 'California')
SET IDENTITY_INSERT [dbo].[State] OFF;
END;
GO
Create AccountState
OPTION 1 – composite Primary Key only
DROP TABLE IF EXISTS [dbo].[AccountState];
IF OBJECT_ID('[dbo].[AccountState]', 'U') IS NULL
BEGIN
CREATE TABLE [dbo].[AccountState] (
[AccountId] [int] NOT NULL
,[StateId] [tinyint] NOT NULL
,CONSTRAINT [PK_AccountState] PRIMARY KEY CLUSTERED ([AccountId] ASC, [StateId] ASC)
,CONSTRAINT [FK_AccountState_Account] FOREIGN KEY ([AccountId]) REFERENCES [dbo].[Account]([AccountId])
,CONSTRAINT [FK_AccountState_State] FOREIGN KEY ([StateId]) REFERENCES [dbo].[State]([StateId])
);
INSERT INTO [dbo].[AccountState] ([AccountId], [StateId])
SELECT A.[AccountId], S.[StateId]
FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
WHERE A.[AccountId] = 1 AND S.[StateId] IN (1, 2, 3)
UNION
SELECT A.[AccountId], S.[StateId]
FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
WHERE A.[AccountId] = 2 AND S.[StateId] IN (3, 4, 5)
UNION
SELECT A.[AccountId], S.[StateId]
FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
WHERE A.[AccountId] = 3 AND S.[StateId] IN (1, 3, 5)
END;
GO
Create AccountState
OPTION 2 – composite Primary Key + non-clustered indexes
DROP TABLE IF EXISTS [dbo].[AccountState];
IF OBJECT_ID('[dbo].[AccountState]', 'U') IS NULL
BEGIN
CREATE TABLE [dbo].[AccountState] (
[AccountId] [int] NOT NULL
,[StateId] [tinyint] NOT NULL
,CONSTRAINT [PK_AccountState] PRIMARY KEY CLUSTERED ([AccountId] ASC, [StateId] ASC)
,CONSTRAINT [FK_AccountState_Account] FOREIGN KEY ([AccountId]) REFERENCES [dbo].[Account]([AccountId])
,CONSTRAINT [FK_AccountState_State] FOREIGN KEY ([StateId]) REFERENCES [dbo].[State]([StateId])
,INDEX [IX_AccountState_Account] NONCLUSTERED ([AccountId])
,INDEX [IX_AccountState_State] NONCLUSTERED ([StateId])
);
INSERT INTO [dbo].[AccountState] ([AccountId], [StateId])
SELECT A.[AccountId], S.[StateId]
FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
WHERE A.[AccountId] = 1 AND S.[StateId] IN (1, 2, 3)
UNION
SELECT A.[AccountId], S.[StateId]
FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
WHERE A.[AccountId] = 2 AND S.[StateId] IN (3, 4, 5)
UNION
SELECT A.[AccountId], S.[StateId]
FROM [dbo].[Account] A CROSS JOIN [dbo].[State] S
WHERE A.[AccountId] = 3 AND S.[StateId] IN (1, 3, 5)
END;
GO
Sample query
SELECT
A.[AccountName]
,S.[StateName]
FROM
[dbo].[AccountState] A_S
JOIN [dbo].[Account] A
ON A_S.[AccountId] = A.[AccountId]
JOIN [dbo].[State] S
ON A_S.[StateId] = S.[StateId]
WHERE
S.[StateCode] = 'CA'
Of the two options, what type of index combination would be seen as most suitable to scale? Composite primary key only or composite primary key plus additional non-clustered indexes? Or is there another option that is more viable?
Best Answer
I'm not sure that the sample query is a fair example to base a recommendation on. The sample query is not a typical multi-tenant application query since it is not specific to a particular client. It is more of a support or management query looking to gain insight about all (or at least multiple) customers. Of course, it could also be maintenance related (e.g. garbage collection would look for oldest dates and not care about
AccountId
). So let's separate this out:General
I don't see any benefit to ever having a non-clustered index that is solely the leading / left-most key column in the clustered index. The clustered index is already in that order and so statistics exist for it. So, the
IX_AccountState_Account
index in Option 2 is purely waste, and a drag on the system.Support / Management / Maintenance queries
These queries, especially the maintenance ones, can work across
AccountId
values. As such, some queries will certainly benefit from non-clustered indexes on clustered index key columns that are notAccountId
(or to put it more generally: that are not the left-most / leading key column). This assumes that you have queries that filter / sort on only the entity ID. Not having any queries like that means you probably don't need this index.Application queries
These queries should always include
AccountId
, so I don't see how they would benefit from an index that is solely on the entity ID.While I have worked on multi-tenant systems similar to what you are describing, I just thought of something that never occurred to me before: since the clustered index statistics are based on the left-most / leading column and not the combination of key columns, it might be beneficial to manually create statistics for those objects. I seem to recall that it is just the density portion that will account for all key columns (not just the leading column), but that might help the query optimizing. This needs to be tested as I have not tried it (and I won't be able to do such a test anytime soon).