Postgresql – Streaming replication and lag when performing VACUUM on master

postgresqlpostgresql-9.4replication

My streaming replication standby server is showing a delay of a few minutes when vacuuming a large table on the master.

What is the reason for the lag, and is there a way to reduce it?

Both servers have identical hardware on a 10GB network. Relevant settings are:

wal_buffers = 16MB
wal_writer_delay = 20ms
checkpoint_segments = 16
shared_buffers = 14GB
archive_timeout = 10
maintenance_work_mem = 1GB

Best Answer

Nick, based on the information we've gathered, I'm going to venture a guess, but if I'm wrong, come back and let us know and there may still be something we can do. I wouldn't say I'm confident about the internals of auto-vacuum processes on hot standby servers.

Streaming Replication, aka Log-Shipping

Streaming replication, hot standby, log shipping...there's a lot of different names that people use, but the general idea is that you replicate your back up server by implementing a system where the transaction logs from your primary server are sent and re-implmented on a secondary server to keep the two identical.

Now, you've stated that when you VACUUM your table on the primary, you have significant lag on your standby, and also that your configuration for log_autovacuum_min_duration is set to -1, which disables logging of autovacuum actions. As a potential remedy, you could set this value to 0, indicating you wish to log all autovacuum operations.

My guess is that, while autovacuuming is occurring periodically on your master, since it isn't being logged, it isn't replicated on the standby. When you run a manual VACUUM, only then does the logged VACUUM become replicated, which may be what's eating into your performance on the standby.

How are things if you run a VACUUM on the master, followed immediately by another VACUUM on master? This will give us some indication of if I'm right or if I'm full of hot air.