Sql-server – Multi-schema, multi-tenant databases – data isolation through composite primary keys

database-designindexmulti-tenantsql serversql-server-2016

We are designing a multi-schema, multi-tenant database for SQL Server 2016 that will service a basic CRUD application which will see small to medium transactional throughput and comprise of 15-20 tables. For the sake of data isolation and security, we are exploring the utilization of a composite primary key that has [TenantId] [int] and [TenantIsolationId] [uniqueidentifier]. These two columns will repeat throughout tables in a hierarchical manner, enforcing referential integrity with appropriate foreign keys, and allow us to enforce deeper row-level security.

In most scenarios that I have read, the usage of a GUID for [TenantIsolatonId] seems to be controversial due to performance implications, especially with regards to space. However, we believe that for the management of physical file partitioning, resource pool delegation, replication, portability, and general referential integrity, that a pairing of an integer and GUID will allow for better isolation and security within the multi-tenant database. We are bound by a range of security provisions, so this further leads us down the path of using this composite key type.

In thinking of children composite keys, I know the order of the columns specified matters. However, I cannot seem to come to a consensus if we should feed the the same composite key structure down the table chain or if it is better to segregate it in a more traditional manner.

For instance, given the two table structures:

CREATE TABLE [Auth].[Tenant] (
    [TenantId] [int] IDENTITY(1,1) NOT NULL
    ,[TenantIsolationId] [uniqueidentifier] NOT NULL CONSTRAINT [DF_Tenant_TenantIsolationId] DEFAULT NEWID()
    ,[TenantName] [varchar](256) NOT NULL

    ,CONSTRAINT [PK_Tenant_TenantId_TenantIsolationId] PRIMARY KEY CLUSTERED ([TenantId] ASC, [TenantIsolationId] ASC)
    ,CONSTRAINT [AK_Tenant_TenantName] UNIQUE NONCLUSTERED ([TenantName] ASC)
);

CREATE TABLE [Auth].[User] (
    [UserId] [int] IDENTITY(1,1) NOT NULL
    ,[TenantId] [int] NOT NULL
    ,[TenantIsolationId] [uniqueidentifier] NOT NULL
    ,[FirstName] [varchar](32) NOT NULL
    ,[LastName] [varchar](32) NOT NULL
    ,[UserName] [varchar](64) NOT NULL

    ,CONSTRAINT [PK_User_TenantId_TenantIsolationId_UserId] PRIMARY KEY CLUSTERED ([TenantId] ASC, [TenantIsolationId] ASC, [UserId] ASC)
    ,CONSTRAINT [AK_User_UserName] UNIQUE NONCLUSTERED ([UserName] ASC)
);

If we are performing row-level security and isolating the data based on [TenantId] and [TenantIsolationId], is the [UserId] primary key better served as above in a composite key, or separately?

E.g.:

CREATE TABLE [Auth].[User] (
    [UserId] [int] IDENTITY(1,1) NOT NULL
    ,[TenantId] [int] NOT NULL
    ,[TenantIsolationId] [uniqueidentifier] NOT NULL
    ,[FirstName] [varchar](32) NOT NULL
    ,[LastName] [varchar](32) NOT NULL
    ,[UserName] [varchar](64) NOT NULL

    ,CONSTRAINT [PK_User_UserId] PRIMARY KEY CLUSTERED ([UserId] ASC)
    ,INDEX [IX_User_TenantId_TenantIsolationId] NONCLUSTERED ([TenantId] ASC, [TenantIsolationId] ASC)
    ,CONSTRAINT [FK_Tenant_TenantId_TenantIsolationId] FOREIGN KEY ([TenantId],[TenantIsolationId]) REFERENCES [Tenant]([TenantId],[TenantIsolationId])
    ,CONSTRAINT [AK_User_UserName] UNIQUE NONCLUSTERED ([UserName] ASC)
);

Based on my understanding of what I have delved into, the composite key is only a good idea if we will always be looking up the data on all three columns always. Since we are wanting to isolate the data, I cannot foresee instances where we wouldn't want to look-up the [TenantId] as well as the [TenantIsolationId] before seeking to the [UserId]. Perhaps, however, I am misunderstanding the pros and cons and am better served utilizing only [UserId] for the primary key, coupled with an index against [TenantId] and [TenantIsolationId]. Is my thinking flawed?

I'm still in the infant stages of the schema's development, so I'll be running plenty of performance tests with large quantities of dummy data once I get the initial sketches completed. But as a general practice, what is recommended in this scenario?

Furthermore and generally encapsulating multi-tenant database architecture that must ensure high levels of data isolation, has there been any significant movement forward that doesn't lean itself towards utilizing a two-valued key combination? I have read and watched a good deal on the topic, primarily referencing Salesforce's Mulitenant Magic Webinar and Google's F1 white paper. More recent articles still tend to follow the concepts they've outlined even in their age, and while I am building a schema for a database that will not be anywhere close the scale of Salesforce and AdWords, I find myself leaning towards the principles that they have resonated.

Best Answer

But as a general practice, what is recommended in this scenario?

Database-per-tenant is the best-practice here. There are scenarios where it is impractical, but should be your strong preference in designing any multi-tenant system on SQL Server.

Database-per-tenant gives you:

  • Excellent security and data isolation that's verifiable and easy to sell.

  • Optimal Performance, with no shared query plans.

  • Horizontal Scalability across as many Instances/Elastic Pools as you need.
  • Performance Isolation, with the option of isolated, dedicated resources for a tenant.
  • Per-Tenant servicing, including the ability to upgrade or hotfix a single tenant.
  • Per-Tenant backup and recovery, and optionally differentiated HA/DR.
  • Per-Tenant Ad-Hoc reporting.

Other than that:

What is the point of the TenantIsolationID?

You should include TenantID in every clustered index that contains tenant data.

It should be the leading column, unless you're using it for partitioning, in which case it can be a trailing column.

You must plan for splitting your single database into smaller databases as a scale plan. But split can be a one-way operation.

Whether TenantID is an INT or UNIQUEIDENTIFIER only matters for index size. All your secondary indexes will be bigger if you use UNIQUEIDENTIFIER. But that's not a huge cost. Fragmentation and page splitting won't be a big deal here. See Good Page Splits and Sequential GUID Key Generation for details on the performance implications of having multiple insert points in a table.