Postgresql – Need a log based replication / CDC solution to publish postgres to oracle subscribers

postgresqlreplication

We have a moderately large postgres 9.1 db that is currently 3TB and will likely grow much larger over the next couple of years. we need a reliable and fast solution for moving this data into either oracle 11G or sql server 2012 databases. we need a log based solution like replication or CDC with a minimal foot print of the postgres server and not looking for a code based ETL solution like SSIS or SAP Data Services.

It should also be able to effectively handle blob / spatial data that is stored in postgres

Best Answer

There is no such feature at this time and the transaction logs lack some of the key information you would need to implement such a feature.

Work is ongoing for PostgreSQL 9.4 to implement logical and bi-directional replication between PostgreSQL instances. Part of this work involves enhancements to the xlogs to add some key information required for logical row replication, adding a new wal_level = 'logical' option. There's also a pluggable infrastructure for decoding the WAL and streaming it to remote hosts, writing SQL to a file, etc.

That's where it becomes useful to you. Your best bet at this point is probably to use that infrastructure to decode PostgreSQL WAL into a change stream you can send to your remote Oracle or SQL Server systems. This will involve a considerable amount of development, as the logical replication code is somewhat immature and still has some limitations even with Postgresql-to-PostgreSQL limitations. DDL is totally unsupported at this time, for one thing.

If you do this you'll need to understand that the BDR work will probably not be committed into PostgreSQL 9.4 in a fully compatible form so you'd have to do re-work to make it compatible with main-line PostgreSQL later.

You'll want to start by adapting the existing text output plugin for the logical walsender to generate more database-agnostic SQL.

If this all sounds way too hard then your only other option is to implement a trigger-based replication solution or adapt an existing trigger based replication system to meet your needs. This will be much (much) higher overhead on the server, but a fair bit simpler.

I should disclose that I work for 2ndQuadrant, who're developing bidirectional replication for PostgreSQL, and I've recently started working on testing and documentation as part of the project.


Update as of 2014-07: Logical changeset extraction / logical decoding was committed to PostgreSQL 9.4, and is due to be released soon. Beta 2 came out a few days ago.

See:

Logical decoding makes it possible to implement what you want, but does not provide the feature you want directly. To stream changes into MS SQL, Oracle, etc, you would need an output plugin for logical decoding that can stream data to a streaming logical log receiver. The receiver application would maintain a connection to the target server and apply the changes as it receives them from PostgreSQL.

This is the basis of the PostgreSQL BDR/UDR architecture. BDR is not directly useful for your purposes, but the logical changeset decoding feature that was built to make BDR possible will be.

A good starting point would be to look at the test_decoding plugin