Postgresql – Fastest tool or technique for occasionally syncing Postgres 9.6 databases

postgresql

Occasionally I need to sync latest data and schema from production to staging db server. What tools or techniques do you recommend if syncing speed is a priority? The sync doesn't need to be in realtime.

Some inputs:

  • Postgres version is 9.6
  • DB size is less than 50GB
  • A full dump and restore takes about one hour. I want to beat this.

The staging db server is used for QA purpose. Whenever new code is merged, the staging app as well as the staging db will be updated. New code is merged several times a day.

One solution I'm thinking about: use something similar to the checkpoint concept.

  1. Initially, restore a full dump to staging server.
  2. Whenever a new code is merged:

    2.1 Restore the staging db state to the latest check point
    2.2 Get the latest state from production
    2.3 Save a new checkpoint

Is it a good direction, and could I use the WAL as the "checkpoint" here?
Thank you.

Best Answer

Occasionally I need to sync latest data and schema from production to staging db server. What tools or techniques do you recommend if syncing speed is a priority? The sync doesn't need to be in realtime.

Well, normally in such a schemas you have to wipe the stating. You didn't say that explicitly but it's critical to your question. If you have people writing to staging and developing with staging then you can't just update it and assume that only the master wrote to it. In such a case your own method is to wipe and reload.

In order to that faster, you can simply, back up the data-directory or use pg_basebackup. Then when you need to reset the database,

  1. Shut down the database,
  2. Restore the data directory.
  3. Copy over the wal files from the archive into ./pg_wal
  4. Restart the server.

Note you'll have to keep around all of the WAL logs since your last base backup.

This is pretty well documented in the PostgreSQL 11 docs here,

Related Question