Postgresql – Recovery using recovery.conf

postgresqlrestoreUbuntu

I am trying to recover a database using recovery.conf file for making point-in-time copy.

On recovery.conf file, I added a recovery target time as below.

recovery_target_time = '2011-06-16 04:10:00 IST'

But in the logs I can see if it advances the time by 3.5 hours to 07:40 and actual recovery is attempted for this time, not the time I given in recovery_target_time.

2011-06-16 12:53:45 IST LOG:  starting archive recovery
2011-06-16 12:53:45 IST LOG:  restore_command = '/app/postgresinstall/bin/pg_standby -l /dbdata/archive %f %p %r'
2011-06-16 12:53:45 IST LOG:  recovery_target_time = '2011-06-16 07:40:00+05:30'
cp: cannot stat `/dbdata/archive/00000001.history': No such file or directory

Why this difference of 3.5 hours and I need help in figuring out whats going on in deciding the recovery target time?

Update: Here are time details from database

postgres=# SELECT EXTRACT(timezone_hour FROM now()),EXTRACT(timezone_minute FROM now());
 date_part | date_part 
-----------+-----------
         5 |        30
(1 row)

postgres=# select now();
               now                
----------------------------------
 2011-08-12 13:08:19.333068+05:30
(1 row)

OS Date/Time:
postgres@xxxxx:~$ date
Fri Aug 12 13:10:19 IST 2011

I faced the same problem when I try to restore from WAL archives on same server and my backup server. Also I can consistently reproduce this problem on both servers.

So if I need to create a near to current time backup, I use a recovery_target_time 3.5 hours behind current time and start the DB with recovery.conf

Best Answer

DISCLAIMER : I am not a PostgreSQL DBA, though I dabble a lot with it.

You probably need to check your timezone in the OS and in psql.

In the OS run this:

[postgres@radarPG-db1 ~]$ date
Fri Jun 17 12:55:37 EDT 2011

In psql, run the following:

postgres=# SELECT EXTRACT(timezone_hour FROM now()),EXTRACT(timezone_minute FROM now());
 date_part | date_part
-----------+-----------
        -4 |         0

postgres=# select now();
              now
-------------------------------
 2011-06-17 12:54:39.195291-04

Many times, some forget to have postgres default the timezone to that of the OS. If the WAL files also contain the timezone in its internal timestamps, you must align the timezone set in postgres with that of the WAL files.

Something else to consider is the following:

The default behavior of recovery is to recover along the same timeline that was current when the base backup was taken. If you want to recover into some child timeline (that is, you want to return to some state that was itself generated after a recovery attempt), you need to specify the target timeline ID in 'recovery.conf'. You cannot recover into timelines that branched off earlier than the base backup.

UPDATE 2011-06-20 15:08 EDT

This is right from the online documentation. Please look this over and see if you missed any WAL files along the way. If you waited too long to setup the WAL files on the new server, postgres may have deleted them on startup. You may need to find those WAL files. See if they reside back in pg_xlog folder from the old server.

Recovering using a Continuous Archive Backup

Stop the server, if it's running.
If you have the space to do so, copy the whole cluster data directory and any tablespaces to a temporary location in case you need them later. Note that this precaution will require that you have enough free space on your system to hold two copies of your existing database. If you do not have enough space, you need at the least to copy the contents of the pg_xlog subdirectory of the cluster data directory, as it might contain logs which were not archived before the system went down.
Clean out all existing files and subdirectories under the cluster data directory and under the root directories of any tablespaces you are using.
Restore the database files from your base backup. Be careful that they are restored with the right ownership (the database system user, not root!) and with the right permissions. If you are using tablespaces, you should verify that the symbolic links in pg_tblspc/ were correctly restored.
Remove any files present in pg_xlog/; these came from the backup dump and are therefore probably obsolete rather than current. If you didn't archive pg_xlog/ at all, then recreate it, being careful to ensure that you re-establish it as a symbolic link if you had it set up that way before.
If you had unarchived WAL segment files that you saved in step 2, copy them into pg_xlog/. (It is best to copy them, not move them, so that you still have the unmodified files if a problem occurs and you have to start over.)
Create a recovery command file recovery.conf in the cluster data directory (see Recovery Settings). You might also want to temporarily modify pg_hba.conf to prevent ordinary users from connecting until you are sure the recovery has worked.
Start the server. The server will go into recovery mode and proceed to read through the archived WAL files it needs. Should the recovery be terminated because of an external error, the server can simply be restarted and it will continue recovery. Upon completion of the recovery process, the server will rename recovery.conf to recovery.done (to prevent accidentally re-entering recovery mode in case of a crash later) and then commence normal database operations.
Inspect the contents of the database to ensure you have recovered to where you want to be. If not, return to step 1. If all is well, let in your users by restoring pg_hba.conf to normal.

Related Solutions

Postgresql – Initial recovery Postgres with backup in sync commit mode

If you don't archive wal's then how do you suppose the slave is going to get them?

You have to archive, and then recovery.conf should read from the archive.

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

The message "The database system is starting up." does not indicate an error. The reason it is at the FATAL level is so that it will always make it to the log, regardless of the setting of log_min_messages:

http://www.postgresql.org/docs/9.1/interactive/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-WHEN

After the rsync, did you really run what you show?:

pgsql -c "select pg_stop_backup();";

Since there is, so far as I know, no pgsql executable, that would leave the backup uncompleted, and the slave would never come out of recovery mode. On the other hand, maybe you really did run psql, because otherwise I don't see how the slave would have logged such success messages as:

Log: consistent recovery state reached at 0/BF0000B0

and:

Log: streaming replication successfully connected to primary

Did you try connecting to the slave at this point? What happened?

The "Success. You can now start..." message you mention is generated by initdb, which shouldn't be run as part of setting up a slave; so I think you may be confused about something there. I'm also concerned about these apparently conflicting statements:

The only ways I have restarted Postgres is through the service postgresql-9.1 restart or /etc/init.d/postgresql-9.1 restart commands. After I receive this error, I kill all processes and again try to restart the database...

Did you try to stop the service through the service script? What happened? It might help in understanding the logs if you prefixed lines with more information. We use:

log_line_prefix = '[%m] %p %q<%u %d %r> '

The recovery.conf script looks odd. Are you copying from the master's pg_xlog directory, the slave's active pg_xlog directory, or an archive directory?

Best Answer

Related Solutions

Postgresql – Initial recovery Postgres with backup in sync commit mode

PostgreSQL 9.1 Hot Backup Error: the database system is starting up

Related Question