Postgresql – Robust way to build a DWH without downtime

I have a process that build a data warehouse using Postgres. The process runs every day at midnight and works in three steps:

Copy the production database (prod) to a temporary database (dwh_tmp) using pg_dump
Generate the fact and dimension tables in the temporary schema using the data from the production database
Rename the current DWH database (dwh -> dwh_old) and move in the temporary database instead (dwh_tmp -> dwh) using ALTER DATABASE dwh_tmp RENAME TO dwh

The problem is that the last step fails if there are still open connections to the dwh database. To get rid of these connections (e.g. using pg_terminate_backend()) I would need to run the DWH generation process with super-user privileges, which I'd like to avoid for security reasons.

Any suggestions on how to implement this in a more robust way? My main goal is to make sure that the DWH is always in a good state and the downtime is as small as possible, hence my idea of first building a temporary DWH and then renaming it. I'm not in direct control of the connecting processes that keep me from renaming the database (as they're openend from a different host).

--we drop the temporary schema if it already exists DROP SCHEMA IF EXISTS dwh CASCADE; --we create the temporary schema CREATE SCHEMA dwh; --we modify the search path for this session SET search_path to dwh; --create your DWH here... --now we commit the new version --make sure you're connected to the correct database when executing this ;) DROP SCHEMA IF EXISTS public CASCADE; --we rename the DWH schema to public ALTER SCHEMA dwh RENAME TO public; --you might need to restore permissions on the new public schema... GRANT USAGE ON SCHEMA public TO dwh_user; GRANT SELECT ON ALL TABLES IN SCHEMA public TO dwh_user;

Best Answer

For reference, here's the solution that I came up with:

Create a new schema dwh
Build the new version of the DWH in that shema
Drop the old public schema
Rename the new dwh schema to public

(this assumes that your main schema is named public, of course)

This method works quite reliably and without interrupting/killing existing database connections, so I prefer the solution over my initial "create and rename database" approach. Here's the SQL that I use for building the DWH (it assumes that you're connected to the DWH database):

Apart from needing to manually restore permissions and having to recreate extensions in the new schema, I found this solution to be quite optimal as it does not require superuser privileges for the DWH user, and works without creating/dropping any databases. I have not yet figured out a way to copy permissions from one schema to another, which is my only minor issue with this approach.

Best Answer

Related Solutions

PostgreSQL for high volume transactions and for Data warehousing

PostgreSQL – How to Prevent Writes to a Schema Temporarily

Related Question