well, you cannot switch back easily.
what you can do is to use the old backup method:
SELECT pg_start_backup('whatever');
rsync the data dir
SELECT pg_stop_backup();
this is the best way as not all the data has to be transported over the network.
i definitely recommend this replacement for pg_basebackup because it is the only RELIABLE way to get things back in order (nobody knows for sure what happens before the crash).
sure, there might be other tricks out there but i would strongly vote for this method.
It looks like you asked 2 questions:
Question 1
Does this mean I should go with multi-master solution if I want to
promote standby to accept write transaction as well?
Built-in PostgreSQL features are enough to get a WAL-based, read-only standby (a.k.a. secondary, a.k.a. slave) server. However, they are not enough to get multi-master operation.
This R/O slave can be promoted at any time to standalone, R/W server by using pg_ctl
.
pg_ctl promote -D /var/lib/postgresql/9.4/main
(note: pg_ctl
tool might be hidden in default debian/ubuntu setup. Look in /usr/lib/postgresql/X.Y/bin
to see it)
When it's promoted, replication stops and slave is disconnected from primary. See relevant fragments on promote
command in pg_ctl documentation and failover docs.
Question 2
Is postgres 9.4 BDR (Bi-Directional Replication) a good solution to
stream between M1 and S1? Or is there any commercial product can do
this?
I don't know BDR (maybe it's a good solution), but do you REALLY need both servers in R/W mode? If not, I strongly recommend using built-in streaming replication (with streaming or log shipping).
To redirect traffic from primary to standby you need some external tool which will do the failover procedure - using either dns-based or IP-based or other failover method. PostgreSQL itself does not know how to redirect traffic or do anything outside the database scope. Popular tools are pgpool (in layer 7) or Linux HA or corosync and friends (in lower layers).
Best Answer
Simply put, if the slave can be started up, both contents of master and slave are perfectly same.
I'll explain the details. The slave is restored from the
base backup
which is the physical database back up of the master by replayingWAL log
(transaction log). So, the slave is a perfect replica of master and the difference between master and slave never occurs if slave can be ran. In other words, if there are any differences between master and slave, the slave cannot be ran.The simplest test is to create a small table at the master and to confirm the table at the slave, which is mentioned at that document.
Master:
Slave:
If you can find the new table at the slave, the databases of slave is a perfect replica of master data.
Additionally, this is a very rare case, but the piece of data of slave may be corrupted by a failure of the copy tool. If you want to avoid this failure, you should do
VACUUM FULL
statement for all tables at the master server. If you doVACUUM FULL
, all tables will be reconstructed at the slave as well as the master. Therefore, the corrupted data of the slave will be restored. (It's a dirty trick.)