I have a process that build a data warehouse using Postgres. The process runs every day at midnight and works in three steps:
- Copy the production database (prod) to a temporary database (dwh_tmp) using pg_dump
- Generate the fact and dimension tables in the temporary schema using the data from the production database
- Rename the current DWH database (dwh -> dwh_old) and move in the temporary database instead (dwh_tmp -> dwh) using
ALTER DATABASE dwh_tmp RENAME TO dwh
The problem is that the last step fails if there are still open connections to the dwh database. To get rid of these connections (e.g. using pg_terminate_backend()
) I would need to run the DWH generation process with super-user privileges, which I'd like to avoid for security reasons.
Any suggestions on how to implement this in a more robust way? My main goal is to make sure that the DWH is always in a good state and the downtime is as small as possible, hence my idea of first building a temporary DWH and then renaming it. I'm not in direct control of the connecting processes that keep me from renaming the database (as they're openend from a different host).
Best Answer
For reference, here's the solution that I came up with:
dwh
public
schemadwh
schema topublic
(this assumes that your main schema is named
public
, of course)This method works quite reliably and without interrupting/killing existing database connections, so I prefer the solution over my initial "create and rename database" approach. Here's the SQL that I use for building the DWH (it assumes that you're connected to the DWH database):
Apart from needing to manually restore permissions and having to recreate extensions in the new schema, I found this solution to be quite optimal as it does not require superuser privileges for the DWH user, and works without creating/dropping any databases. I have not yet figured out a way to copy permissions from one schema to another, which is my only minor issue with this approach.