PostgreSQL Backup – How to Do Per-Tenant Backups for Multi-Tenant Databases?

backupmulti-tenantpostgresqlrestore

Are there any best practices, experiences or guidelines when it comes to do per-tenant backups/restores in a microservice multi-tenant application?

In my case I have multiple Postgres servers which are regularly backed up as a whole. Each of them contains a single database which is migrated using Flyway.
The single database contains one schema with all the tables with the data of all tenants. Tenants are identified by UUID and all the data has foreign key relations so that e.g. one tenant can easily be deleted.

Now there's the requirement to be able to do restores for an individual tenant. I have found very little about the entire topic so far, but the only process I have come up with so far is this:

  • Restore all the backups to a different set of databases or different database servers
  • Spin up a new instance of each microservice so that it can start in isolation and is able to perform the necessary database migrations, if required
  • Delete the tenant from each of the production databases
  • Execute script or code in each service that identifies the data in the respective restored database and copy exactly these rows over to the production database

However doing or automating this seems like a disproportionate effort to me. I'm specifically looking for alternative (simpler) approaches, tools or databases that can simplify this process or actual experience how other people approach this problem.

Best Answer

Postgres has no feature to ferret out semantically a subset of the rows in your tables. Nor does any other database I’ve heard of.

You will need to write your own code to retrieve just the one tenant’s rows from each of the relevant tables. This code might be in your app, or might be done on the server within Postgres.

Also, you’ll need to decide if any other tables not split by tenant should also be copied into each tenant’s backup.

I have heard about some database vendors considering features to support multi-tenancy explicitly. The database would understand that the data for each tenant should be considered separate and kept in logical and/or physical silos. But I don’t know of any such products ready for production yet.