PostgreSQL – Move Database to New Server and Update with Minimum Downtime

postgresqlpostgresql-9.3postgresql-9.6recovery

I have a PostgreSQL 9.3 database, which I have to move to a new server and update to 9.6 with minimum downtime.
I thought I could :

  1. on the old Postgres, activate archive_command to copy WAL to a NFS directory mounted on both the old and new server
  2. pg_dumpall the old Postgres
  3. install a Postgres 9.6 on the new server
  4. import the SQL dump
  5. restart the new server in recovery mode, pointing to the NFS directory to get the WAL from the old server
    So when the old and new Postgres are in sync, I can stop the old, verify that the new is up-to-date, change applications configuration to the new Postgres, and restart the applications.

I tested this with some test databases, and it does not work: on the new postgres the latest checkpoint's redo wal file has a number which is higher than the WAL number on the old. When I import the sql dump some WAL are generated, so the WAL number get incremented. So when I restart the new in recovery mode it does not read the old Postgres WAL.

I'm not sure this method is possible at all.

Could someone advise me another method if this one is not possible?

Best Answer

You can't do this, because WAL archive can only work with physical copies, while pg_dumpall and restore generates a logical copy. Also, WAL format is not compatible across major versions, so you couldn't re-play WAL from one version to another anyway. (You can do something like that with logical replication, but that doesn't exist in version 9.3, and is a rather advanced topic requiring extensive design and testing in the newer versions.)

Is it really necessary to move versions and servers at the same time? If something goes wrong during such a double migration, you've just created a big diagnostic mess by changing two major things at the same time.

The lowest down-time way to move versions (other than logical replication) is to use pg_upgrade -k. This requires you to have both old and new versions installed on the same machine simultaneously.

The lowest down-time way to move machines is to set up streaming replication; and wait for it catch up (as much as it can get while the master is still active), then shutdown all user connections to the master, wait for all WAL to replay to the replica, and promote the replica.