PostgreSQL AWS – Schema Replication on Amazon RDS

amazon-rdsawspostgresqlreplication

Is there a way to replicate Postgres Schemas in AWS into one location for reporting purposes?

I have 3 postgres databases (up to date with current version) running in AWS that I would like to replicate into one location to build an Operation Data Store (ODS). I require the ability to query cross databases – I've primarily only used MSSQL – and I believe that you can only cross query schemas.

Or do I have to put this into another amazon product to be able to achieve this result?

Best Answer

I don't know how these things apply to RDS in particular, but regarding community PostgreSQL at large, there are a few options:

Use logical replication to replicate data from the other servers into a single PostgreSQL database. A major caveat here is that the tables on the central server need to have the exact same table name, including schema name, as they do on the outlying servers. So if the outlying servers have tables with the same table name and schema name as each other, that will be a problem. If you are starting from scratch, it is pretty easy to solve this by having each server include the server descriptions in all of its schema names so that all schema names are unique (i.e. "london_accounts", "chicago_accounts", rather than just "accounts" on both servers). But if you are adapting existing systems which didn't abstract away the schema names, it can be a major pain to have to change everything to fit this pattern.

Or, you could just leave the data where it is, and use the foreign data wrapper postgres_fdw to query it from the central location. Caveats here are that all the outlying servers have to be online for this to work, the high ping time and possible low network throughput to them can be major performance problem, and FDW has higher overhead and often comes up with worse plans than than native data.

You can overcome the uptime, ping, and throughput issues by using physical replication of each outlying servers to the central location (e.g. three database instances running on three different port numbers of the same piece of hardware), and then use FDW from a 4th server to each replica to tie those replicas together. You might still suffer from some overhead and planning issues.