Postgresql – Insert results from query, update old existing row if query doesn’t still return same value

postgresqlsurrogate-keyupsert

I am trying to refactor a table which contains historical data and therefore isn't ever deleted, but has an 'active' field.

Each day we run a query (a few joins, but quite slow) to get some data to insert into this table:

  • if the row exists already, and is present in the query, we do nothing
  • if the row exists but the result isn't in todays query, we set 'active' to false
  • if the row doesn't exist but is in the query, we insert it (with active defaulting to true)

It is possible for an entity to be active, made inactive (same row), then a few days later become active again (would be a new row).

The table is using a surrogate key, as the natural key can be repeated as noted above, though there should only ever be a single entry that is active on the natural key.

I'm curious if there is an action or 'pattern' I can take that will enable such functionality – I realise I'm asking a lot 🙂

I am using a java library (jOOQ) to generate the sql, so I am only looking for sql as the solution, no PL/pgSQL and no stored procs. I doubt it is possible without 2 passes (1. perform the update where not in query result, 2. perform the insert from query result) but I'm intrigued if there are possibilities.

Edit: Example table history, where ** is an update to same row

pk_surrogate | natural_id | active| date_made_active
1            | 1          | true  | 2020-01-01 -- row for entity 1 added
1 **         | 1          | false | 2020-01-01 -- row for entity 1 no longer present on 2020-01-02, so marked inactive
2            | 1          | true  | 2020-01-03 -- new row for entity 1 added when it was present on 2020-01-03
3            | 2          | true  | 2020-01-03 --new value for 2020-01-03

Best Answer

Something like this maybe?

with daily_query as (
  .... this is your current query ...
), upsert as (
  -- covers first and third requirement
  insert into the_table (....)
  select ...
  from daily_query
  on conflict (pk_column) do nothing
)
-- this sets those rows to in active that exist in the target table
-- but are not in the initial query
update the_table t
  set active = false
where not exists (select * 
                  from daily_query q
                  where q.pk_column = t.pk_column);