PostgreSQL Multi-Tenant – Backup and Restore Single Database

postgresql

I am thinking about development of simple CRM for florists. Florists (company with 1-5 employees) will have local installed app on their PCs (call it "Floppa"). Floppa will be communicating to Windows Communication Foundation (WCF) service hosted on my remote server. For each of customer there will be running one instance of WCF service. Business layer under WCF service will be communicating to Postgresql database dedicated to specific customer.

I would like to use continuous archiving. If I understand well WALs will contain logs for every change in every database in cluster.

If one of customers will accidentally deletes/change some data and will ask me for recovery "today's morning state of data" how could I quickly comply his request? According my current knowledge I would need to do:

Stop WCF service (to prevent work on db which will be replace later)
Restore whole cluster to "today's morning state" on another computer
Dump customer's morning state database
Replace customer's database by dumped one
Start WCF service

Is that correct? Or is there any better solution to this backup/restore issue? What happen if I will create one database per cluster? What will be the drawbacks? I guess that one of advantages will be much more faster recovery.

What will differ if count of customers will be 10, 50, 100, 250? How Postgresql handles hundreds of clusters?

Thanks.

Best Answer

Restoring the database will take as long as it takes. There is no way to know with the information you provide. If each individual database is quite small, step 2 could be acceptably fast with dozens or maybe even hundreds of databases in the cluster.

Having many database clusters will incur more overhead. There is a certain minimal size of a cluster (around 21MB for the usual initdb run) which can be amortized over all databases in the cluster. You will have to manage many wal streams, and if you most archiving is driven by an archive_timeout that means you will have many more files in total by dividing them. You will likely run out of available semaphores, but you can recover some of them by lower the max_connections setting. On modern OS, you could probably configure the system to allow a lot of semaphores. You should be able to test some of this stuff on your own infrastructure with some relatively simple scripting.

But if you do run one cluster for each customer and have 100s of customers, there is no reason you couldn't spread them over multiple machines.

Related Solutions

Postgresql – How does PostgreSQL handle Checkpoints in the middle of a WAL-enabled backup

You asked:

how postgreSQL will handle the recovery with a pg_data content containing some files which are inconsistent.

pg_start_backup() ensure the data file is at least as new as the checkpoint. On recovery, the logs are applied.

If the data is old, the log will update it..

If the data is new, the log will have same content. There is no hurt writing it again.

The data are never newer then the log, because the logs are write ahead (WAL).

You asked:

... xfs-freeze ...

xfs-freeze is alike to pg_start_backup(), it don't take a snapshot. You need a volume manager to do that.

You asked:

... why do create tablespace & create database statements are unsupported if the WAL can replay everything?

It is supported, just some little gotcha. See http://www.postgresql.org/docs/8.1/static/backup-online.html :

23.3.5. Caveats

CREATE TABLESPACE commands are WAL-logged with the literal absolute path, and will therefore be replayed as tablespace creations with the same absolute path. This might be undesirable if the log is being replayed on a different machine. It can be dangerous even if the log is being replayed on the same machine, but into a new data directory: the replay will still overwrite the contents of the original tablespace. To avoid potential gotchas of this sort, the best practice is to take a new base backup after creating or dropping tablespaces.

Postgresql – Multi-tenant database constraints

This is what I ended up going with:

create table blog_posts (
    id uuid not null,
    tenant_id uuid not null,
    title text not null,
    content text not null,
    primary key (id)
);

create unique index blog_posts_tenant_id_id_idx on blog_posts (tenant_id, id);

create table comments (
    id uuid not null,
    tenant_id uuid not null,
    blog_post_id uuid not null references blog_posts (id),
    content text not null,
    primary key (id)
);

alter table comments add foreign key (tenant_id, blog_post_id)
    references blog_posts (tenant_id, id)
    on update cascade;

Best Answer

Related Solutions

Postgresql – How does PostgreSQL handle Checkpoints in the middle of a WAL-enabled backup

Postgresql – Multi-tenant database constraints

Related Question