Postgresql – Streaming Replication in PostgreSQL

postgresqlreplication

I'm trying to set up two PostgreSQL servers on one machine and perform a streaming replication. I have succeeded once, but when i've tried again following exactly the same steps it doesn't work.. Those are the steps:
I have $PGDATA = home/postgresql/9.1/data
and $STANDBY = home/postgresql/9.1/data2

  1. Set up two nodes:
    initdb -D $PGDATA
    initdb -D $STANDBY
  2. In the master node create a user for replication. I do that in pgAdmin (it does have superuser privileges)
  3. In the master node in pg_hba.conf add the part that allows standby to connect:
    host replication repuser 127.0.0.1/0 md5
  4. In the master node in postgresql.conf set:

    max_wal_senders = 1
    archive_mode = on
    archive_command = 'cp %p ~/postgresql/backup/archivedir/%f'
    wal_level = archive
    wal_keep_segments = 32

  5. Start the master node and do the base backup:

    psql -d dellstore2 -c "SELECT pg_start_backup('backup for replication',   true)"
    rsync -av ${PGDATA}/ $STANDBY --exclude postmaster.pid
    psql -d dellstore2 -c "select pg_stop_backup()"
    

    pg_stop_backup says that everything is fine, all the WAL files were archived

  6. In the standby (data2) node I create recovery.conf with:

    standby_mode = 'on'
    primary_conninfo = 'host=127.0.0.1 port=5432 user=repuser password=haslo'
    trigger_file = '/home/michau/postgresql/replication.trigger'
    restore_command = 'cp /home/michau/postgresql/backup/archivedir/%f "%p"'

  7. Start the master node, then start the standby node – replication should start and standby should catch up with the master. That was exactly what happened the first time.
    Now when I start the standby I get: "Address already in use" error.
    Of course both standby and master have the same port specified in postgresql.conf (they have exactly the same postgresql.conf files). If I change the port in standby to let's say 5433 then I get:

    LOG:  database system was shut down in recovery at 2012-06-12 19:48:01 CEST
    LOG:  entering standby mode
    cp: cannot stat /home/michau/postgresql/backup/archivedir/000000010000000000000007: No such file or directory
    LOG:  consistent recovery state reached at 0/7000070
    LOG:  record with zero length at 0/7000070
    cp: cannot stat /home/michau/postgresql/backup/archivedir/000000010000000000000007: No such file or directory
    LOG:  streaming replication successfully connected to primary
    LOG:  redo starts at 0/7000070
    

And it just hangs here. Running ps -ef | grep postgresql yields:

michau    2491  1898  0 19:46 pts/0    00:00:00 postgres -D /home/michau/postgresql/9.1/data
michau    2493  2491  0 19:46 ?        00:00:01 postgres: writer process 
michau    2494  2491  0 19:46 ?        00:00:00 postgres: wal writer process
michau    2495  2491  0 19:46 ?        00:00:00 postgres: autovacuum launcher process
michau    2496  2491  0 19:46 ?        00:00:00 postgres: archiver process   last was 000000010000000000000008
michau    2497  2491  0 19:46 ?        00:00:00 postgres: stats collector process
michau    2571  2214  0 19:49 pts/1    00:00:00 postgres -D /home/michau/postgresql/9.1/data2
michau    2572  2571  0 19:49 ?        00:00:01 postgres: startup process   recovering 000000010000000000000009
michau    2575  2571  0 19:49 ?        00:00:01 postgres: writer process
michau    2578  2571  0 19:49 ?        00:00:02 postgres: wal receiver process   streaming 0/99782DC
michau    2579  2491  0 19:49 ?        00:00:00 postgres: wal sender process repuser 127.0.0.1(42142) streaming 0/99782DC
michau    2586  2491  0 19:51 ?        00:00:00 postgres: michau postgres ::1(49941) idle
michau    2587  2491  0 19:51 ?        00:00:01 postgres: michau dellstore2 ::1(49942) idle

The recovering 0000000010000009 where changing for a while, but for half an hour it doesn't anymore.

I'm sure there is something I must have done the first time and not written down or something, but I am at a complete loss to say what it was. I would appreciate any help.

Best Answer

PostgreSQL replicas never finish recovering. This is by design. Basically a replica is always in "recovering from disaster" mode except that it is using receiving the WAL segments from the master rather than on disk.

So what you are seeing is not cause for concern. If it is not working yet, then you will need to provide a more detailed description of what you are trying to do and what is not working. But as far as you are posting it seems normal.