Single vs Multiple Non-Clustered Indexes with Composite Primary Key – SQL Server 2016

database-designindexsql serversql-server-2016

I have a database where I am utilizing a TenantId throughout all tables that need to be uniquely identified to a particular tenant, and due to the ordering requirements within composite keys, I have TenantId as the first in the index list. Now comes into question an Authentication piece where the User table contains the TenantId, UserId (the IDENTITY column), and Email among other login-specific items.

The login portal is not tenant-specific, so when logging in, the user will simply input their Email, thus seeking to the row that verifies their login information. In this scenario, we are unable to immediately leverage the composite primary key of TenantId and UserId until we have found the row that applies to Email.

The composite primary key on TenantId and UserId will always be utilized in all other conditional clauses. However, in order to leverage this key in the first place, we must first seek to that row based on a query of Email. Without an index on Email, a table scan will occur instead.

My question is, what type of index combination would be seen as most suitable in this scenario? A single non-clustered index on Email alone, another composite key on UserId and Email in conjunction with the single non-clustered index on Email with INCLUDES on other relevant data, or none of the above?

The schema is similar as such:

CREATE TABLE [User] (
     [TenantId] [int] NOT NULL
    ,[UserId] [int] IDENTITY(1,1) NOT NULL
    ,[Email] [varchar](64) NOT NULL
    ,[FirstName] [varchar](32) NOT NULL
    ,[MiddleName] [varchar](32) NULL
    ,[LastName] [varchar](32) NOT NULL
    ,[PasswordHash] [varbinary](64) NOT NULL
    ,[PasswordSalt] [varbinary](32) NOT NULL
    ,[HashMethodId] [tinyint] NOT NULL
    ,[IsActive] [bit] NOT NULL CONSTRAINT [DF_User_IsActive] DEFAULT 1
    ,[IsLocked] [bit] NOT NULL CONSTRAINT [DF_User_IsLocked] DEFAULT 0

    ,CONSTRAINT [PK_User_TenantId_UserId] PRIMARY KEY CLUSTERED ([TenantId] ASC, [UserId] ASC)
    ,INDEX [IX_User_UserId_Email] NONCLUSTERED ([UserId] ASC, [Email] ASC)
    ,CONSTRAINT [FK_Tenant_TenantId] FOREIGN KEY ([TenantId]) REFERENCES [Tenant]([TenantId])
    ,CONSTRAINT [FK_HashMethod_HashMethodId] FOREIGN KEY ([HashMethodId]) REFERENCES [HashMethod]([HashMethodId])
);
CREATE NONCLUSTERED INDEX [IX_User_Email] ON [User]([Email]) INCLUDE ([PasswordHash],[PasswordSalt],[HashMethodId],[IsActive],[IsLocked])

-- Note for research: Why can an index that has INCLUDE not be specified in CREATE TABLE?

My understanding is that [IX_User_UserId_Email] is useful in this scenario to quickly tie back to [PK_User_TenantId_UserId], thus seeking to the appropriate level of isolation more efficiently. Is this an incorrect assumption? Am I better served with using just [IX_User_Email]?

  • All tables will JOIN to User on TenantId and UserId.
  • No tables will strictly JOIN to User based on UserId.
  • A lookup will happen based strictly on a query of Email. TenantId and UserId will not be known until the row is fetched. Once the row is fetched, remaining queries will utilize TenantId and UserId.

Another option I have been tossing around is within the Tenant table, including a Domain column that specifies the tenant source e-mail domain (which will always be the same across a tenant). Once the user has entered their Email and tabs/selects the Password field on the login page, it will parse out the e-mail domain (@sample.com), allowing us to query the smaller Tenant table to find their TenantId, thus being able to leverage the composite key [PK_User_TenantId_UserId] and thus only having to utilize a non-clustered index on Email. This may be a needless approach, however.

Best Answer

This is a strange one as [UserId] [int] IDENTITY(1,1) NOT NULL is unique.

There could be a case to use [UserId] alone as the PK. You will get less fragmentation compared to a composite clustered index TenantId, UserId.

I get you plan to use TenantId, UserId in all queries but you don't need to. UserID will uniquely identify each user.

If you have reports by TenantId you can include that in the tables and put a non-clustered index on it.

As far as email I would just put a non-clustered unique index on it. You can include the other fields if you like but it is just picking up data from a single row so it is not really necessary. Right now there is nothing to prevent duplicate email.

Could argue repeating TenantId in all tables is not 3NF as it can be derived from UserId.

I personally would not repeat TenantId and have a view for each table that brings it in. I get you don't want to do it that way but it is still my answer.

select t.*, u.tenant 
  from table t 
  join [user] u 
    on u.userid = t.userid  

User is not a good name for a table as it is a keyword.