Postgresql – Logical replication – Manually triggering update of data on destination tables (after schema change)

postgresqlpostgresql-10replication

I'm testing out Logical replication on Postgresql (10.4 at the moment) with a quite specific use case.

I have some central DB (publisher) from where I need to replicate few tables to distributed DB's (subscribers) (across different servers).

This is the scenario:

  1. Table in Master DB changes (gets new column). Lets say that this new column is of type "timestamp without timezone", not null, with default value of now().
  2. Because of 1) replication stops working, which is fine. (logical replication target relation <relation> is missing some replicated columns)
  3. After some time, distributed DBs also get the same change (new column).
  4. At this point, replication starts working again and new rows are replicated, including the data from new column (for new rows).
  5. However, for existing rows, the values (timestamps) from new column in master DB table and replicated DB table are different (by time between kicking off the migration on master table and replica tables) and do not get updated up until row changes in Master DB/source table.

My question: is there a way to "force" update new column data replication without truncating table data, updating all rows in source table or similar very time&resource consuming operations?

Also, performance related note – there will be 1-10 million rows in one of tables that is being replicated.

I'm open to all suggestions&ideas.

Thanks.

Best Answer

You would want to add the new column to the subscriber with the appropriate default already in place. You will have to fish that appropriate value out of the publisher, I don't know of a better way to get it. After the column is added, you might want to either drop the default, or turn it back to now(). If the only way the table gets inserted/updated is from the publication, then this isn't necessary as the publisher always sends the actual value and so doesn't use the default. But leaving the unused default in place seems like a recipe for confusion.

begin;
alter table pgbench_accounts add column newcol2 timestamp not null default '2019-03-26 09:44:56.391128';
alter table pgbench_accounts alter newcol2 set default now();
commit;

Note that in version 10, this will still be pretty resource intensive, as adding a column with a non-NULL default requires a complete table re-write the subscriber, just as it did on the publisher. (This was changed in version 11 to use "fast default"). But the intensive work is confined within each server, not the connection between them.