Postgresql – Many-to-One replication / Transactional Replication

multi-masterpostgresqlreplicationtransactional-replication

I have several PostgreSQL DBs in different geographical locations (local sites).

  • Each local site DB have the same schema, but unique data. For example, Take a table with columns: Site_ID, Department_ID, Department_Name. Site_ID is unique for each site.

  • I want to collect all the data from the local site DBs into a centralised DB (PostgreSQL again) which acts as a data warehouse.

  • The corresponding example table on the centralised DB will have the same columns as above. All local site data will go into this table. Each site data designated by the Site_ID, of course.

Question: How to achieve this with PostgreSQL replication methods (streaming/multi-master UDR/BDR/etc.)
I see this can be done with SQLServer using Transactional Replication. What is the best way to achieve this functionality with PostgreSQL?

Restriction: The local sites can make only outgoing network connections (i.e. no inbound connections due to firewall restrictions)

Best Answer

I see at least two options:

  • if you don't need near-real-time replication, but just once a day (or maybe every hour) is enough for you, and the amount of data is small, you can quite simply create a script that will run in each remote location:

    1. connect to the local database
    2. do a dump of the local data (pg_dump --data-only should be fine)
    3. connect to the remote database
    4. remove all data for that location from it
    5. push the data from the dump.
  • if you need near-real-time replication, or the volume of data is large (but the changes are small in comparison), you could look into Slony. Haven't used it in a while, and it's not necessarily its typical use scenario, but I believe it should work as the datasets are distinct.

    Slony uses triggers on the tables to store changes in a log, and then daemons that transfer the logs from one server to the other, and apply the changes. It would be worth checking if this would work in the "can't connect from central to remote" scenario, though. If it doesn't, you'll need to consider some form of VPN (which you should anyway).