Postgresql – What configurations affect the speed of DDL statements in PostgreSQL

ddlperformancepostgresqlquery-performance

I would like to setup PostgreSQL in such a way that dropping and recreating the entire database (~100 tables/views + a few rows per table) is as fast as possible.
This is for a development environment where data integrity is not important.
This database reset will happen very often.

What are the configuration options that might help with this (or psql meta commands that can help speed this process up)

EDIT:
Imagine i have a single sql file (that i constantly edit and save) that creates everything needed in an empty database.

BEGIN;
CREATE TABLE ...
INSERT INTO ...
CREATE TABLE ...
INSERT INTO ...
COMMIT;

I need it to run as fast as possible, right now it takes a few seconds even for about 10 tables, i would like it to be under 1 sec

Best Answer

In what sense is database integrity not important? Even if you don't care about integrity, if the production database doesn't accept entries in table baz because of a foreign key constraint against table bar, do you want that constraint emulated in development? Or do you want to find out you forgot about it when you move to production? Personally, I would not sacrifice integrity in development.

Official toolkit, pg_dump, pg_restore

I would personally use pg_dump and pg_restore. They're configurable, user-friendly and very easy to use. That said, there are faster options but I have never had a need for them for my use cases. The official tools do not support binary dumps, but they also work between versions fairly well. pg_restore -1 will wrap the whole thing in a transaction and disable WAL.

Raw Data with TABLESPACE

If you're willing to make raw backups you can get far greater speed. You can do this,

  1. Create a a folder somewhere like /data/dbs chown it to be owned the user/group postgres

    mkdir -p /data/dbs;
    chown postgres:postgres /data/dbs;
    
  2. Make this into a TABLESPACE

    CREATE TABLESPACE dbspace LOCATION '/data/dbs';
    
  3. Create a database there..

    CREATE DATABASE foobar WITH TABLESPACE = dbspace;
    

Now you can connect to -d foobar. And do some stuff.

CREATE TABLE foo ( idfoo serial PRIMARY KEY, name text );
CREATE TABLE baz ( idbaz serial, idfoo int REFERENCES foo );
INSERT INTO foo (name) VALUES ('foo'), ('bar');
INSERT INTO baz (idfoo) SELECT idfoo FROM foo;

Now you can disconnect.. you can copy the data-directory to backup. this process can be scripted easily. For me,

  • the data directory is PG_9.5_201510051 for you it may different.
  • the command to stop PostgreSQL runs though /usr/sbin/service (I use Ubuntu)

So for me, it looks like this.

sudo service postgresql stop
sudo cp -pR PG_9.5_201510051/ backup
sudo service postgresql start

Reconnect.

DROP TABLE foo CASCADE;

Then when disconnect and again stop the service we can restore the backup (also easily scripted),

service postgresql stop;
cp -pR backup/ PG_9.5_201510051/

Upon reconnecting, foo is magically back and restored. You can't get faster than this...

Not fast enough?

If this isn't fast enough. In the event your database has TB of data or something crazy.

  • You can use rsync to only synchronize files changed.
  • You can likely use ZFS snapshots or BTRFS snapshots on a block device mounted to the data directory. YMMV.