PostgreSQL 11.5 – History Table Design for Deletions

auditpostgresqltrigger

I've got a question about the design of a history table in Postgres.

The setup is that I've got a table that holds a needs list. A location recalculates in-demand items every five minutes, and pushes that list to Postgres. The current "hot" list is then accessible to various client apps to pull. So, every five minutes, the rows related to a specific location and deleted and then repopulated with whatever is now hot. Imagine a screen on a wall in a warehouse where people look up to see urgent tasks, that kind of thing. This is more-or-less a queue/notice table, not a real storage table.

What we're tracking in the list of in demand items are specific parts, with IDs. It is valuable for us to collect data (or at least stats) over time. We might find that specific items show up on the list every day, while others appear only rarely. That can help guide purchasing choices and such.

That's the background, I'm in Postgres 11.5, so no generated columns. Does the strategy described below seem about right, or can it be improved upon? The base table is called need and the history table is called need_history

need
— Stores the data of interest
— Has a NOW() assigned to created_dts on INSERT as part of the table setup.
— Has a PER STATEMENT after trigger to get the 'transition table' of deleted rows.
— The statement trigger INSERTS INTO need_history to preserve the data.

need_history
— It's almost a clone of need, but with a few extra fields bolted on. Specifically, deleted_dts, assigned with NOW() as a default when the data is inserted, and duration_seconds that stores the ~ number of seconds the record existed in the need table.
— Since this is PG 11.5, no generated columns, so I'll need a EACH ROW trigger to calculate duration_seconds.

Shorter:
need
with a statement level delete trigger that pushes to need_history.

need_history
with a row-level trigger to calculate duration_seconds since I don't have generated columns available in PG 11.x.

And, to address the obvious question, no, I don't have to store the derived duration_seconds value as it can be generated on-the-fly but, in this case, I want to denormalize to simplify a variety of queries, sorts and summaries.

My brain is also saying "ask about fill factors", and I don't know why.

Below is the initial setup code, in case the summary above isn't clear. I haven't pushed any data through this yet, so it may have flaws.

I'd be grateful for any advice or recommendations about how to best do this in Postgres.

BEGIN;

DROP TABLE IF EXISTS data.need CASCADE;

CREATE TABLE IF NOT EXISTS data.need (
    id uuid NOT NULL DEFAULT NULL,
    item_id uuid NOT NULL DEFAULT NULL,
    facility_id uuid NOT NULL DEFAULT NULL,
    hsys_id uuid NOT NULL DEFAULT NULL,
    total_qty integer NOT NULL DEFAULT 0,
    available_qty integer NOT NULL DEFAULT 0,
    sterile_qty integer NOT NULL DEFAULT 0,
    still_need_qty integer NOT NULL DEFAULT 0,
    perc_down double precision NOT NULL DEFAULT '0',
    usage_ integer NOT NULL DEFAULT 0,
    need_for_case citext NOT NULL DEFAULT NULL,
    status citext NOT NULL DEFAULT NULL,
    created_dts timestamptz NOT NULL DEFAULT NOW(),

CONSTRAINT need_id_pkey
    PRIMARY KEY (id)
);


ALTER TABLE data.need OWNER TO user_change_structure;

COMMIT;

/* Define the trigger function to copy the deleted rows to the history table. */
CREATE FUNCTION data.need_delete_copy_to_history()  
  RETURNS trigger AS
$BODY$
BEGIN
        /* need.deleted_dts      is auto-assigned on INSERT over in need, and 
           need.duration_seconds is calculated in an INSERT trigger (PG 11.5, not PG 12, no generated columns). */

   INSERT INTO data.need_history 
            (id,
            item_id,
            facility_id,
            hsys_id,
            total_qty,
            available_qty,
            sterile_qty,
            still_need_qty,
            perc_down,
            usage_,
            need_for_case,
            status,
            created_dts)

     SELECT id,
            item_id,
            facility_id,
            hsys_id,
            total_qty,
            available_qty,
            sterile_qty,
            still_need_qty,
            perc_down,
            usage_,
            need_for_case,
            status,
            created_dts

       FROM deleted_rows;

    RETURN NULL; -- result is ignored since this is an AFTER trigger       
END;
$BODY$
LANGUAGE plpgsql;

 /* Bind a trigger event to the function. */
DROP TRIGGER IF EXISTS trigger_need_after_delete ON data.need;
CREATE TRIGGER trigger_need_after_delete 
    AFTER DELETE ON data.need
    REFERENCING OLD TABLE AS deleted_rows
    FOR EACH STATEMENT EXECUTE FUNCTION data.need_delete_copy_to_history();

/* Define the table. */
BEGIN;

DROP TABLE IF EXISTS data.need_history CASCADE;

CREATE TABLE IF NOT EXISTS data.need_history (
    id uuid NOT NULL DEFAULT NULL,
    item_id uuid NOT NULL DEFAULT NULL,
    facility_id uuid NOT NULL DEFAULT NULL,
    hsys_id uuid NOT NULL DEFAULT NULL,
    total_qty integer NOT NULL DEFAULT 0,
    available_qty integer NOT NULL DEFAULT 0,
    sterile_qty integer NOT NULL DEFAULT 0,
    still_need_qty integer NOT NULL DEFAULT 0,
    perc_down double precision NOT NULL DEFAULT '0',
    usage_ integer NOT NULL DEFAULT 0,
    need_for_case citext NOT NULL DEFAULT NULL,
    status citext NOT NULL DEFAULT NULL,
    created_dts timestamptz NOT NULL DEFAULT NULL,
    deleted_dts timestamptz NOT NULL DEFAULT NOW(),
    duration_seconds int4 NOT NULL DEFAULT 0,

CONSTRAINT need_history_id_pkey
    PRIMARY KEY (id)
);


ALTER TABLE data.need_history OWNER TO user_change_structure;

COMMIT;

/* Define the trigger function to update the duration count.
  In PG 12 we'll be able to do this with a generated column...easier. */

CREATE OR REPLACE FUNCTION data.need_history_insert_trigger() 
  RETURNS trigger AS
$BODY$
BEGIN
/* Use DATE_TRUNC seconds to get just the whole seconds part of the timestamps. */
NEW.duration_seconds =
      EXTRACT(EPOCH FROM (
        DATE_TRUNC('second', NEW.deleted_dts) - 
        DATE_TRUNC('second', NEW.created_dts)
        ));
  RETURN NEW;
END;
$BODY$
LANGUAGE plpgsql;


/* Bind a trigger event to the function. */
DROP TRIGGER IF EXISTS trigger_need_history_before_insert ON data.need_history;
CREATE TRIGGER trigger_need_history_before_insert 
    BEFORE INSERT ON data.need_history
    FOR EACH ROW EXECUTE FUNCTION data.need_history_insert_trigger();```

Best Answer

That looks fine.

The hard part with implementing a queue in SQL is not historization, but how to manage the queue itself (add, find and remove items). If there is a lot of traffic, you will probably need aggressive autovacuum settings for the queue table.

I would partition the history table. What people usually forget to design is how to get rid of old data. The history table may grow large, and you won't need the data indefinitely. If you have partitioned the table (so that there are between 10 and a couple of hundred partitions), it will be easy to get rid of old data.