Postgresql – Postgres 9.1.16 Hot Standby

checkpointpostgresqlstandbywrite-ahead-logging

I am using PostgreSQL 9.1.16 (and I cannot upgrade) with Hot Standby.

We have a few batches that perform some heavy writes, resulting in a lot of WAL bein produced in a short period of time.

The standby desynchronizes because the pg_xlog becomes full.

I could just increase the FS but I would like to understand.

I found out that with this version of PostgreSQL, the restartpoints are triggered only because of the checkpoint_timeout (5 min) parameter on the standby.

The primary checkpoints a lot (every 5 sec or so) because of the WAL activity (triggered by checkpoint_segments (10) ). The result is that my standby doesn't recycle its WAL quickly enough and the FS becomes full before the next checkpoint.

Since checkpoints every 5s is "a bit too much", I tried to set checkpoint_segments to 50 on the primary and checkpoint_timeout to 30s in the standby. It's ugly but somehow the checkpoint timing are in synch and the standby doesn't "desync".

My questions are:

In 9.1.16 and before. How did you handle restart point when it was triggered only by checkpoint_timeout?
Can I do something else PostgreSQL wise than this ugly workaround ?
I don't have much experience with pg in real world. my pg_xlog FS is 5G. Is it a normal size or is it way too small ?

PS: Feel free to ask me more info, it's my first post.

PS2: the article I found about checkpoint_timeout on standby http://www.postgresql.org/message-id/AANLkTimpudRqHf8kFfQZxyMHsULYo79qwK5PCXxP21gt@mail.gmail.com

Best Answer

To answer your questions:

Set the checkpoint_timeout lower, as you did, to 30 seconds or so. I made sure to enable log_checkpoints in my postgresql.conf and used check_postgres.pl to monitor the number of segments and replication lag. You can also set checkpoint_completion_target=0 to adjust the checkpointing behavior as well. But really, on a replica, checkpoint_timeout is really the only way to keep the pg_xlog directory down in size on a replica. Additional reference here: pgsql bugs #7801
Adjusting checkpoint_timeout is unfortunately the best option. There are some tips on dealing with pg_xlog filling up in this blog post: Solving pg_xlog out of disk space problems, but these mostly apply to a primary that's run out of space, and the panic related to that.
PostgreSQL WAL segments take up approximately this much room, according to the documentation for 9.1 WAL Configuration:

(2 + checkpoint_completion_target) * checkpoint_segments + 1 or checkpoint_segments + wal_keep_segments + 1 files

So if you do the math like this for example, with a checkpoint_completion_target=0.9 and checkpoint_segments = 50:

(2 + 0.9) * 50 + 1 = 146 * 16MB = 2336 MB

Which is only about 45% of your 5 GB of your pg_xlog filesystem. So long as you aren't getting checkpoint warnings in your logs, and the space isn't in danger of being completely used up, 5 GB should be fine. Using these formulas, you should be able to appropriately size your pg_xlog filesystem for your usage.

Greg Smith has a lovely chart in PostgreSQL 9.0 High Performance, referenced here: Server Configuration Tuning Practices. I would also recommend the book as a reference, even though it's a bit old, because a lot of the information in it is still quite applicable.

Hope that helps. =)

Related Solutions

PostgreSQL Hot Standby

It sounds like PostgreSQL is set to recover from log shipping rather than by connecting as a replication user. Please double and triple check your recovery.conf and if that doesn't work, then post it here.

The approach you are taking is a valid approach though, and it means that the recovery will just wait for the next segment until it arrives creating the message you are seeing, but it must be transferred using whatever recovery command you have configured in the master's postgresql.conf.

Postgresql – How to request a flush of the postgresql transaction logs

Most likely what you're seeing is a huge checkpoint_segments value and long checkpoint_timeout; alternately, they might have set wal_keep_segments to a very large value if it's supposed to support streaming replication.

You can force a checkpoint with the CHECKPOINT command. This may stall the database for some time if it has accumulated a huge amount of WAL and hasn't been background-writing it. If checkpoint_completion_target is low (less than 0.8 or 0.9) then there's likely to be a big backlog of work to do at checkpoint time. Be prepared for the database to become slow and unresponsive during the checkpoint. You cannot abort a checkpoint once it begins by normal means; you can crash the database and restart it, but that just puts you back to where you were.

I'm not certain, but I have the feeling a checkpoint could also result in growth of the main database - and do so before any space is freed in the WAL, if it is at all. So a checkpoint could potentially trigger you running out of space, something that's very hard to recover from without adding more storage at least temporarily.

Now would be a very good time to get a proper backup of the database - use pg_dump -Fc dbname to dump each database, and pg_dumpall --globals-only to dump user definitions etc.

If you can afford the downtime, stop the database and take a file-system level copy of the entire data directory (the folder containing pg_xlog, pg_clog, global, base, etc). Do not do this while the server is running and do not omit any files or folders, they are all important (well, except pg_log, but it's a good idea to keep the text logs anyway).

If you'd like more specific comment on the likely cause (and so I can be more confident in my hypothesis is) you can run the following queries and paste their output into your answer (in a code-indented block) then comment so I'm notified:

SELECT version();

SELECT name, current_setting(name), source
  FROM pg_settings
  WHERE source NOT IN ('default', 'override');

It is possible that setting checkpoint_completion_target = 1 then stopping and restarting the DB might cause it to start aggressively writing out queued up WAL. It won't free any until it does a checkpoint, but you could force one once write activity slows down (as measured with sar, iostat, etc). I have not tested to see if checkpoint_completion_target affects already-written WAL when changed in a restart; consider testing this on a throwaway test PostgreSQL you initdb on another machine first.

Backups have nothing to do with WAL retention and growth; it isn't backup related.

See:

Best Answer

Related Solutions

PostgreSQL Hot Standby

Postgresql – How to request a flush of the postgresql transaction logs

Related Question