PostgreSQL Multi-Tenant – Backup and Restore Single Database

postgresql

I am thinking about development of simple CRM for florists. Florists (company with 1-5 employees) will have local installed app on their PCs (call it "Floppa"). Floppa will be communicating to Windows Communication Foundation (WCF) service hosted on my remote server. For each of customer there will be running one instance of WCF service. Business layer under WCF service will be communicating to Postgresql database dedicated to specific customer.

I would like to use continuous archiving. If I understand well WALs will contain logs for every change in every database in cluster.

If one of customers will accidentally deletes/change some data and will ask me for recovery "today's morning state of data" how could I quickly comply his request? According my current knowledge I would need to do:

  1. Stop WCF service (to prevent work on db which will be replace later)
  2. Restore whole cluster to "today's morning state" on another computer
  3. Dump customer's morning state database
  4. Replace customer's database by dumped one
  5. Start WCF service

Is that correct? Or is there any better solution to this backup/restore issue? What happen if I will create one database per cluster? What will be the drawbacks? I guess that one of advantages will be much more faster recovery.

What will differ if count of customers will be 10, 50, 100, 250? How Postgresql handles hundreds of clusters?

Thanks.

Best Answer

Restoring the database will take as long as it takes. There is no way to know with the information you provide. If each individual database is quite small, step 2 could be acceptably fast with dozens or maybe even hundreds of databases in the cluster.

Having many database clusters will incur more overhead. There is a certain minimal size of a cluster (around 21MB for the usual initdb run) which can be amortized over all databases in the cluster. You will have to manage many wal streams, and if you most archiving is driven by an archive_timeout that means you will have many more files in total by dividing them. You will likely run out of available semaphores, but you can recover some of them by lower the max_connections setting. On modern OS, you could probably configure the system to allow a lot of semaphores. You should be able to test some of this stuff on your own infrastructure with some relatively simple scripting.

But if you do run one cluster for each customer and have 100s of customers, there is no reason you couldn't spread them over multiple machines.