Postgresql – Postgres 9.5 xlog fills all disk space in replication setup with replication slot

postgresql-9.5replication

I have replication set up on Postgresql 9.5 using replication slot. I have one primary and one slave machine, both Ubuntu 14.04 with a cluster called main that is in a directory other than the default. On both servers separately the database worked as expected (before setting up replication), including access from another Ubuntu server.

On the primary, I have in the postgresql.conf:

wal_level = hot_standby
archive_mode = on
max_replication_slots = 8
archive_timeout = 60
max_wal_senders = 8
wal_keep_segments = 100
logging_collector = on

I added a replication user to pg_hba.conf:

host    replication     repuser         [iprange]         md5

I restarted the db and created a replication slot like so (as sudo -u postgres psql postgres):

SELECT * FROM pg_create_physical_replication_slot('repslot001');

On the slave machine, I started with the pg_basebackup:

sudo -u postgres pg_basebackup -v -h [ipofprimary] -U repuser -D [mydatadir] -R -Xs -P

I added to postgresql.conf:

hot_standby = on

My recovery.conf:

standby_mode = 'on'
primary_conninfo = 'user=repuser password=geheim host=[ipofprimary] port=[port] sslmode=prefer sslcompression=1 krbsrvname=postgres'
primary_slot_name = 'repslot001'
trigger_file = '[datadirectory]/finish.recovery'
recovery_target_timeline='latest'

I restarted the db and everything seemed fine. I went home at this point.

The next day my master server had crashed because the disk was full:

PANIC:  could not write to file "pg_xlog/xlogtemp.11700": No space left on device

I googled around and found that this happens if the archive_command, as set in postgresql.conf on the master, is incorrect or missing.

But it is my understanding that when using replication slots I don't need an archive_command (and so, I didn't set an archive command). What mistake did I make?

If you need more information to be able to answer this, please let me know. But note that I need to continue my work, I am now trying if I can promote slave to master and clean up the old broken master. I'll make a backup of the data directory, which also has the logs. Also note that this is a test setup, I will not lose any data. Only when I'm confident I know how it works and how to set it up correctly will I start using replication in a production environment.

Thank you.

Edit.
The mistake I made was that I mixed up using archiving and using replication slots (thanks Frankie for your patience). I now have the following in my postgresql.conf – and it is working as expected:

wal_level = hot_standby
max_wal_senders = 8
max_replication_slots = 1
hot_standby = on

I removed the archiving settings archive_mode and archive_command.

Best Answer

If you just want a hot standby master/slave setup, without, eg point in time recovery, just set archive_mode to false or remove it from your postgres.conf. Otherwise you'll have to come up with a storage plan so you have enough space which might involve setting up archive_command to copy the archive to a disk with enough space.