Postgresql – How to DRY up a series of upserts of the same fields from multiple tables

data-warehouseetlpostgresql

I have a bunch of tables that all have a storenumber, extracteddate and extractedtime column.

I'm doing ETL on these tables and I want to put the last extracted date and time for each store in a single table, with an additional column for the input table name. So I've ended up repeating the same upsert a bunch of times and really the only thing that is different is the CTE that I am using for input:

with rows as (
  select distinct on(storenumber)
         storenumber, extracteddate, extractedtime
    from storefinancialdate
  )
  insert into star.stat_etl (storenumber, filename, last_extracted, last_loaded)
    select rows.storenumber,
           'storefinancialdate',
           (rows.extracteddate + rows.extractedtime::time),
           transaction_timestamp()
      from rows
    on conflict(storenumber, filename) do update
      set last_extracted = excluded.last_extracted,
          last_loaded    = excluded.last_loaded;

with rows as (
  select distinct on(storenumber)
         storenumber, extracteddate, extractedtime
    from storeproduct
)
insert into star.stat_etl (storenumber, filename, last_extracted, last_loaded)
  select rows.storenumber,
         'storeproduct',
         (rows.extracteddate + rows.extractedtime::time),
         transaction_timestamp()
    from rows
  on conflict(storenumber, filename) do update
    set last_extracted = excluded.last_extracted,
        last_loaded    = excluded.last_loaded;

-- repeat a bunch more times...

Is there some way that I can DRY this up? I can't pass a CTE into a function, can I?

(I get that I could call a function on each record of each CTE, one at a time. That would make the code shorter but also much, much slower. So I don't want to do that.)

Best Answer

You could use UNION in the CTE. There is still some duplication of code but less than before:

with etl (storenumber, filename, last_extracted) as
(
    ( select distinct on (storenumber)
          storenumber, 
          'storefinancialdate',
          extracteddate + extractedtime::time 
      from storefinancialdate
      order by
          storenumber,
          extracteddate desc, extractedtime desc
    )
  union all
    ( select distinct on (storenumber)
          storenumber, 
          'storeproduct',
          extracteddate + extractedtime::time 
      from storeproduct
      order by
          storenumber,
          extracteddate desc, extractedtime desc
    )
  -- union all ...
)
insert into star.stat_etl 
    (storenumber, filename, last_extracted, last_loaded)
select 
    storenumber,
    filename,
    last_extracted,
    transaction_timestamp()
from etl
  on conflict (storenumber, filename) do update
    set last_extracted = excluded.last_extracted,
        last_loaded    = excluded.last_loaded ;