Postgresql – Best practices to run time offset delayed replication

best practicespostgresqlreplication

I want to setup replication system on postgresql like #1 MASTER SERVER and two replicated servers: #2 SERVER in realtime (for load balancing), #3 SERVER delayed with some time offset (for example 2 days) to prevent application and human mistakes on data update/delete

Someone got some experience with that kind of setup? What are real life disadvantages and problems with that solution?

Best Answer

If you are considering PostgreSQL 9.0 with Hot Standby, set up the archive so that the archive files only become available to the slave after a certain time. For example, you could have a little shell script that hardlinks the files from an incoming archive directory to an outgoing one based on file timestamps. Make sure you preserve the order. (Naturally, don't activate streaming replication if you do this.)

Related Solutions

Mysql – 3 Databases, Best Replication Setup

I wrote up an interesting layout last year which features DRBD pairs in two data centers (DC1,DC2) with as follows

DRBD Pair in DC1 (db1 and db2)
DBVIP for Primary of DRBD Pair1 is 10.1.2.30
DRBD Pair in DC2 (db3 and db4)
DBVIP for Primary of DRBD Pair2 is 10.1.2.40
Have MySQL Circular Replication Between DRBD Primaries
Have the 10.1.2.40 as Master_Host for DBRD Pair 1
Have the 10.1.2.30 as Master_Host for DBRD Pair 2

MySQL high availability, failover and replication with Latency

MySQL Replication : 1 Slave / Multiple Masters

Here is why I suggested this: Using two data centers, you setup automatic failover for DRBD Pair in one data center. Let the other DRBD Pair in the second data center be for DB disaser site with it own local redundancy and failover. Should you ever loses one data center, the other data center is fully read with it own local failover setup. Your app would just have to use the DBVIP of the other database center in such a catastrophic case.

Please keep in mind that using DRBD in conjunction with MySQL is only beneficial if all of your data uses the InnoDB Storage Engine. Hard failovers in DRBD could easily result in crashed MyISAM tables.

Here is another setup to consider:

As with DRBD setups, any DRBD Secondary would provide just a Disk-Level copy of your MySQL Folder. It is available as a warm standby. MySQL is not being run on the DRBD Secondary. If you want the third DB server to become hot standby, ditch DRBD altogether and use pure MySQL Replication. With three DB servers, using db3 at a remote site, simply setup the following:

+--> db1 --> db2 --> db3 -->+
^                           |
|                           V
+<--------------------------+

Using your rudimentary failover, now you have two hot standby servers. You just have to make sure each DB server has a unique server_id value. I also recommend using MySQL 5.5 because it uses SemiSync Replication which is more sensitive to communication failures and stop Replication better. You will have to setup the appropriate heartbeats and timeouts.

Mysql – Master – Master replication Duplicate entry error

The tutorial you linked to is, I'm afraid, very optimistic. It forgets to note that it only covers the case of an auto generated AUTO_INCREMENT PRIMARY KEY.

The methods it suggests do not cover the general case for a UNIQUE KEY, including, of course, a general PRIMARY KEY.

The article does mention that "The most common problem with replication is primary key collision". It then completely fails to mention all the other problems...

So, anyway, any UNIQUE KEY collision will render your replication broken, possibly in both directions.

Which table was it that failed your replication? Was that a table with an AUTO_INCREMENT PRIMARY KEY or a "general" PRIMARY KEY (ie on non-AUTO_ICNREMENT columns)?

To solve the general case of unique keys, your application must be very aware of its location (Netherlands or Washington DC), and must refrain from writing data that may cause collision. Alternatively it might want to use some synchronizing/locking mechanism; that would of course induce latency as data must cross the Atlantic.

Might I also suggest that the use of binlog_do_db is also naive and dangerous. Are you sure you don't want to replicate the mysql schema? What happens when you add new schemas? The general correct solution is to have full replication without filtering, or otherwise have a very good explanation why not to do so.

Finally, I highly discourage the use of active-active master-master replication. I had customers using it in the past. The amount of trouble they went into because of that setup, and in spite of my counter-recommendations, led to many ruined weekends and holidays.

You might want to look at Galera replication. It's a synchronized master replication solution, and I hear it works over WAN. This will allow your application to remain ignorant. Watch out for network failures, as one part of your cluster will turn immutable.

Best Answer

Related Solutions

Mysql – 3 Databases, Best Replication Setup

Mysql – Master – Master replication Duplicate entry error

Related Question