PostgreSQL Replication – How to Replicate Only Significant Records

postgresqlreplication

I have a remote database which holds data for various databases in various locations. I want to do database replication, but only data specific to a certain database should be replicated to it. I can identify the data using a column id.

i.e if my id starts with dba23 then this data should be replicated to database with the dba23 in this case collection_db.

Am using dblink currently, but I want a real-time solution.

suggestions are highly welcome

Best Answer

You should be able to do this with slony. Slony is an asynchronous (but near real-time) trigger based single-master/multi-slave replication solution for PostgreSQL.

Take a look at the "Retail Store Problem" in this presentation (OpenOffice). This example actually deals with simulating a multi-master replication but the trick used can also be applied in the other direction for your use case:

  1. On the master use table inheritance to route your data (according to the id) from the parent table to a slave specific child table. This can be done with insert triggers or rules.
  2. Replicate the slave specific child table(s) to the slave database.
  3. On the slave use table inheritance again to have the same table name in all databases (parent table only wrapper for slave specific child table).

Instead of table inheritance you could also use INSTEAD OF triggers on views with PostgreSQL 9.1. The little ascii art below should give you an idea of the basic structure of this solution.

db_master                          db_slave_dba23       db_slave_dba17

tbl_parent                         tbl_parent           tbl_parent
|                                  ^                    ^
v                                  |                    |
tbl_child_dba23 --- replicate ---> tbl_child_dba23      |
|                                                       |
v                                                       |
tbl_child_dba17 --- replicate ------------------------> tbl_child_dba17
|
v
...

Be aware that while (under certain pre-conditions) slony can well be used as a production grade replication system, such a complex replication solution can get very cumbersome to maintain and monitor. Especially if DDL statements are involved in your daily database use cases this solution may not be suitable at all.

CAVEATS

  • The replicated tables are read-only on the slaves
  • Replication is not transactional (but neither is dblink)
  • Slony can be a real PITA