Postgresql – Disable all constraints and table checks while restoring a dump

check-constraintsdatabase-designpg-dumppostgresqlpostgresql-9.1

I've obtained a dump of my PostgreSQL database with:

pg_dump -U user-name -d db-name -f dumpfile

which I then proceed to restore in another database with:

psql X -U postgres  -d db-name-b -f dumpfile

My problem is that the database contains referential constraints, checks and triggers and some of these (checks it would seem in particular) fail during restoration since the information is not loaded in the order that would cause those checks to be honored. For instance, inserting a row in a table may be associated with a CHECK that calls a plpgsql function that checks whether a condition holds in some other unrelated table. If that latter table is not loaded by psql before the former, an error occurs.

The following is a SSCCE that produces such a database that once dumped with pg_dump cannot be restored:

CREATE OR REPLACE FUNCTION fail_if_b_empty () RETURNS BOOLEAN AS $$
    SELECT EXISTS (SELECT 1 FROM b)
$$ LANGUAGE SQL;

CREATE TABLE IF NOT EXISTS a (
     i              INTEGER                    NOT NULL
);

INSERT INTO a(i) VALUES (0),(1);
CREATE TABLE IF NOT EXISTS b (
    i  INTEGER NOT NULL
);
INSERT INTO b(i) VALUES (0);

ALTER TABLE a ADD CONSTRAINT a_constr_1 CHECK (fail_if_b_empty());

Is there a way to disable (from the command line) all such constraints during dump restoration and enable them back again afterwards? I am running PostgreSQL 9.1.

Best Answer

So you look up other tables in a CHECK constraint.

CHECK constraints are supposed to run IMMUTABLE checks. What passes OK for a row at one time should pass OK at any time. That's how CHECK constraints are defined in the SQL standard. That's also the reason for this restriction in the manual:

Currently, CHECK expressions cannot contain subqueries nor refer to variables other than columns of the current row.

Still, expressions in CHECK constraints are allowed to use functions, even user-defined functions. Those should be IMMUTABLE, but Postgres does not currently enforce this. According to this related discussion on pgsql-hackers, one reason is to allow references to the current time, which is not IMMUTABLE by nature.

But you are looking up rows of another table, which is completely in violation of how CHECK constraints are supposed to work. I am not surprised that pg_dump fails to provide for this.

Move your check in another table to a trigger (which is the right tool), and it should work with modern versions of Postgres.

PostgreSQL 9.2 or later

While the above is true for any version of Postgres, several tools have been introduced with Postgres 9.2 to help with your situation:

pg_dump option `--exclude-table-data`

A simple solution would be to dump the db without data for the violating table with:

--exclude-table-data=my_schema.my_tbl

Then append just the data for this table at the end of the dump with:

--data-only --table=my_schema.my_tbl

But complications with other constraints on the same table might ensue. There is an even better solution:

Solution: `NOT VALID`

Up to Postgres 9.1, the NOT VALID modifier was only available for FK constraints. This was extended to CHECK constraints in Postgres 9.2. The manual:

If the constraint is marked NOT VALID, the potentially-lengthy initial check to verify that all rows in the table satisfy the constraint is skipped. The constraint will still be enforced against subsequent inserts or updates [...]

A plain Postgres dump file consists of three "sections":

pre_data
data
post-data

^{Postgres 9.2 also introduced an option to dump sections separately with -- section=sectionname, but that's not helping with the problem at hand.}

Here is where it gets interesting. The manual:

Post-data items include definitions of indexes, triggers, rules, and constraints other than validated check constraints. Pre-data items include all other data definition items.

Bold emphasis mine.
You can change the offending CHECK constraint to NOT VALID, which moves the constraint to the post-data section. Drop and recreate:

ALTER TABLE a
  DROP CONSTRAINT a_constr_1
, ADD  CONSTRAINT a_constr_1 CHECK (fail_if_b_empty()) NOT VALID;

A single statement is fastest and rules out race conditions with concurrent transactions. (Two commands in a single transaction would work, too.)

This should solve your problem. You can even leave the constraint in that state, since that better reflects what it actually does: check new rows, but give no guarantees for existing data. There is nothing wrong with a NOT VALID check constraint. If you prefer, you can validate it later:

ALTER TABLE a VALIDATE CONSTRAINT a_constr_1;

But then you are back to the status quo ante.

Related Solutions

Postgresql – How to make pg_dump skip extension

For anyone looking for a workaround, limiting pg_restore to a specific schema has helped me get around this bug. See https://stackoverflow.com/a/11776053/11819

Postgresql – pg_restore: [archiver (db)] could not execute query: ERROR: schema “public” already exists

The error is harmless but to get rid of it, I think you need to break this restore into two commands, as in:

dropdb -U postgres mydb && \
 pg_restore --create --dbname=postgres --username=postgres pg_backup.dump

The --clean option in pg_restore doesn't look like much but actually raises non-trivial problems.

For versions up to 9.1

The combination of --create and --clean in pg_restore options used to be an error in older PG versions (up to 9.1). There is indeed some contradiction between (quoting the 9.1 manpage):

--clean Clean (drop) database objects before recreating them

and

--create Create the database before restoring into it.

Because what's the point of cleaning inside a brand-new database?

Starting from version 9.2

The combination is now accepted and the doc says this (quoting the 9.3 manpage):

--clean Clean (drop) database objects before recreating them. (This might generate some harmless error messages, if any objects were not present in the destination database.)

--create Create the database before restoring into it. If --clean is also specified, drop and recreate the target database before connecting to it.

Now having both together leads to this kind of sequence during your restore:

DROP DATABASE mydb;
...
CREATE DATABASE mydb WITH TEMPLATE = template0... [other options]
...
CREATE SCHEMA public;
...
CREATE TABLE...

There is no DROP for each individual object, only a DROP DATABASE at the beginning. If not using --create this would be the opposite.

Anyway this sequence raises the error of public schema already existing because creating mydb from template0 has imported it already (which is normal, it's the point of a template database).

I'm not sure why this case is not handled automatically by pg_restore. Maybe this would cause undesirable side-effects when an admin decides to customize template0 and/or change the purpose of public, even if we're not supposed to do that.