Postgresql – How to model parent-child relationship correctly in multi-tenant, multi-schema Postgres database

database-designpostgresql

I have a multi-tenant SaaS app where each tenant gets their own schema. When a user creates an account, I create a new schema and run a bunch of migrations to create tables under that schema. Then, when the user wants to authenticate, they provide their tenant info (this can be their subdomain, e.g. mycompany.coolapp.com, or it can be a unique account ID), and the app routes subsequent queries to that tenant's schema.

However, certain pieces of tenants' data will be public. For instance, the user will need to create invoices and email their links (containing a UUID) to their customers. When the customer clicks the link, the app needs to be able to figure out which tenant the invoice belongs to, based on the UUID.

To do this, I created a "public.invoices" table, with two columns: uuid and tenant. The latter contains the tenant's schema name, which I can then use to perform the subsequent queries (i.e. the invoice's actual data) to the appropriate schema.

Question 1: Is this design sound? Is there some other, more standard way of routing unauthenticated/pre-authenticated requests to specific schemas in situations where each tenant gets their own schema?

Question 2: How do I tell Postgres that, when the tenant deletes the invoice (the one in their schema), the public record (under "public.invoices") also should be deleted? I'm trying to avoid using two queries, if some database-provided option or feature exists. I looked at CASCADE, but it seems that only goes from parent to child, and as far as I can tell there isn't a way to define a foreign key relationship in this scenario where the tenant invoice is the parent record.

Best Answer

There are many ways to implement this, but let me suggest partitioning.

The table public.invoice would then be list partitioned, with the tenant being the partitioning key. Each partition sits in the schema of the respective tenant.

That way everybody can query their own invoices in their own schema or all invoices in the public schema.

The big advantage of partitioning here is that it becomes very easy to remove a tenant: simple detach the respective partition and drop it.

Note that partitioning becomes impractical with thousands of partitions. But if you plan to have that many tenants, it is a bad idea to implement that with thousands of schemas with identical tables in a single database. Having too many tables will slow down metadata operations, and it makes life harder for autovacuum (which in turn will make your life hard).