Postgresql – Generate data delta between two databases PostgreSQL

postgresqlpostgresql-9.6

I have a big database with a lot of data ~50GB that spans around 40 tables – running PostgreSQL 9.6. This database is copied and sits in number of client sites without an Internet connection. Changes that are made to this database should be delivered to the client sites.

I'm looking for a way to generate a delta script between the current version of the database to the previous and execute this script in each client's site (open source/ freeware solution – RedGate is not an option).

The main assumption is that the schema is the same – so the delta is only in data (insert / update / remove rows)

I have checked a number of proposed tools (here for example) and the one that worked nice was SQLWorkbench-J using its WbDataDiff command. But its very slow – even if the databases resides on the same machine. I assume that its because the delta is calculated externally to support different databases sources.

Does anyone have a suggested way to do that? Is there a way to make it run internally on the PostgreSQL instance to increase the performance (I understand that joins cannot be calculated using different databases according to here – is using the "foreign data wrapper" may help in this situation)?

Best Answer

The traditional mechanism if you can

  • Dump the tables at point-in-time into a CSV (\COPY with ORDER BY)
  • Modify the tables, or wait for modiications
  • Dump the tables again at a point-in-time (\COPY with ORDER BY)
  • Diff the CSVs
  • Thennn
    • If the rows are not present in the second versions, write a DELETE (or batch for a larger delete)
    • Otherwise write a INSERT ON CONFLICT .. DO UPDATE (or batch for a larger INSERT)

This handles INSERTS, UPDATES and DELETES and only requires you to check for the presence of the surrogate key. If you don't have a surrogate key, things obviously get a lot more complex. And, even more complex if you have no UNIQUE keys at all.

Alternatively, dump the table as a 50gB CSV as an .xz (or other compressed format) and just be done with it. Probably compresses to a few GB, in a few hours.