Postgresql – What are the downsides of using PARTITION for multi-tenant access

amazon-rdsmulti-tenantpartitioningpostgresql

We have an RDS Postgres database which keeps individual store data in a common schema with a field called store_id to distinguish between stores. Let's say we have 1MB per store and 10k stores so we're talking about 10GB of data.

We're making some changes to how the system works in general and rethinking some of the pieces. One possibility that came up is to create partitions for each store so they will be completely isolated. It seems ideal for our use case, but we don't have experience with this many partitions so I don't know if there are gotchas.

A few notes, the queries are always done on a per-store level – for this data set we don't need querying across stores. Most of the stores will be active most of the time – so this isn't like historical data where old partitions won't be written to. There isn't another dimension like time that would make sense to partition on.

Is there anything we should be concern about with a partition/schema per store or any areas that might be a problem?

UPDATE:
I found some more references to this:

Another (perhaps important) detail I forgot to add is that the individual terminals at each store have their own copy of the database (and sometimes multiple terminals/databases per location) so that can continue to run completely offline for extended periods of time. It's an asynchronous master-master data-level replication – writes can happen via an app to the cloud or on the terminals and we handle the syncing and conflict resolution.

The thought to migrate to a schema per store relates more to simplicity in the replication and backup logic to treat each store as an independent entity. It also could make it easier to scale because we can more easily move stores to different cloud databases. It also might help us with migrations if we could make schema changes per store as the upgrade to new versions (not sure if/how this would work with PG partitions.) It's not related to security concerns.

Best Answer

10k stores so we're talking about 10GB of data. One possibility that came up is to create partitions for each store so they will be completely isolated. It seems ideal for our use case, but we don't have experience with this many partitions so I don't know if there are gotchas.

10,000 partitions is insanity. Partitions are done for performance not for security. Their use has planner, index, and underlying file-system implications. There is absolutely no reason to ever do that. You can however create 10,000 users, or authenticate against LDAP and manage security like that. For more information see this, and examples.

Ultimately, I probably wouldn't do either unless I absolutely needed too. It's easier to design abstracted secure app-level interfaces (database-models/middleware etc) to authenticate and to fly with that. I've seen row-level permissions used in production code, but typically that's for a class of data access.

For example. Perhaps your application has two levels of access one for unauthenticated forward-facing web servers; and, one for on-premise access. In this case, you can ensure with pg_hba that admins are on site (or are otherwise hackers clever enough to have compromised on-site equipment).