Postgresql – Problems restoring backup from base backup + WAL with WAL-E

linuxpostgresqlpostgresql-9.1write-ahead-logging

I'm trying to do nightly restores of the main database on a separate node. I want to do weekly full restores and nightly restores by consuming the latest WAL files.

On the master I have

wal_level = hot_standby
archive_command = 'wal-e --s3-prefix=s3://... wal-push %p'

and I've done a full backup with:

wal-e --s3-prefix s3://... backup-push <pg data>

On the backup node I have done:

PGDATA=.. initdb
wal-e backup-fetch $PGDATA LATEST

which works and I'm able to connect and select data. Nice.

So to get it to consume the latest WAL archives I added a $PGDATA/recovery.conf:

restore_command = 'wal-e --s3-prefix ... wal-fetch "%f" "%p"'
standby_mode = 'off'

When start the server I expected it to go ahead and look for WAL segments to consume before starting in non-recovery mode, but I'm getting this:

LOG:  database system was interrupted; last known up at 2015-03-21 12:20:50 CET
wal_e.operator.backup INFO     MSG: begin wal restore
        STRUCTURED: time=2015-03-21T12:16:38.820461-00 pid=23630 action=wal-fetch key=s3://.../wal_005/00000005.history.lzo prefix=...
/ seg=00000005.history state=begin
lzop: <stdin>: not a lzop file
wal_e.blobstore.s3.s3_util WARNING  MSG: could no longer locate object while performing wal restore
        DETAIL: The absolute URI that could not be located is s3://.../wal_005/00000005.history.lzo.
        HINT: This can be normal when Postgres is trying to detect what timelines are available during restoration.
        STRUCTURED: time=2015-03-21T12:16:39.829223-00 pid=23630
wal_e.operator.backup INFO     MSG: complete wal restore
        STRUCTURED: time=2015-03-21T12:16:39.830287-00 pid=23630 action=wal-fetch key=s3://.../wal_005/00000005.history.lzo prefix=.../ seg=00000005.history state=complete
LOG:  starting archive recovery
wal_e.operator.backup INFO     MSG: begin wal restore
        STRUCTURED: time=2015-03-21T12:16:39.900355-00 pid=23638 action=wal-fetch key=s3://.../wal_005/00000005000000050000009D.lzo prefix=.../ seg=00000005000000050000009D state=begin
lzop: <stdin>: not a lzop file
wal_e.blobstore.s3.s3_util WARNING  MSG: could no longer locate object while performing wal restore
        DETAIL: The absolute URI that could not be located is s3://.../wal_005/00000005000000050000009D.lzo.
        HINT: This can be normal when Postgres is trying to detect what timelines are available during restoration.
        STRUCTURED: time=2015-03-21T12:16:40.896215-00 pid=23638
wal_e.operator.backup INFO     MSG: complete wal restore
        STRUCTURED: time=2015-03-21T12:16:40.899320-00 pid=23638 action=wal-fetch key=s3://.../wal_005/00000005000000050000009D.lzo prefix=.../ seg=00000005000000050000009D state=complete
WARNING:  WAL was generated with wal_level=minimal, data may be missing
HINT:  This happens if you temporarily set wal_level=minimal without taking a new base backup.
LOG:  consistent recovery state reached at 5/9D009DA8
LOG:  record with zero length at 5/9D009DA8
LOG:  redo is not required

So the questions are;

  1. is my approach fundamentally wrong? Is there an alternative way to get this to work?

  2. PostgreSQL is claiming I'm having wal_level=minimal, how can I check the level of the produced WAL files? On the main server I get this when checking it's actual settings:

    select * from pg_settings where name='wal_level';
    -[ RECORD 1 ]------------------------------------------------
    name       | wal_level
    setting    | hot_standby
    unit       |
    category   | Write-Ahead Log / Settings
    short_desc | Set the level of information written to the WAL.
    extra_desc |
    context    | postmaster
    vartype    | enum
    source     | configuration file
    min_val    |
    max_val    |
    enumvals   | {minimal,archive,hot_standby}
    boot_val   | minimal
    reset_val  | hot_standby
    sourcefile | /.../postgresql.conf
    sourceline | 155
    

EDIT: adding details on archive_command on master

Best Answer

wal-e backup push 

will NOT push any wal files to s3.

So this backup will be mostly unrecoverable.

Use

archive_command = '... wal-e wal-push %p'

on master to make this backup usable.