PostgreSQL 9.3 – Primary Key Violation by Trigger INSERT

postgresqlstored-procedurestransactiontrigger

My problem

Consider a table t with many frequent updates from users, from which only the last few are relevant.

In order to keep the table size reasonable, whenever a new row is inserted old rows from the same user_id are deleted. In order to keep an archive, the row is also written to t_history.

Both t and t_history have the same schema, in which id is a bigserial with a primary key constraint.

Implementation

Stored procedure

CREATE FUNCTION update_t_history()
RETURNS trigger
AS
$$
declare
BEGIN
    -- Insert the row to the t_history table. `id` is autoincremented
    INSERT INTO t_history (a, b, c, ...)
    VALUES (NEW.a, NEW.b, NEW.c, ...);

    -- Delete old rows from the t table, keep the newest 10 
    DELETE FROM t WHERE id IN (
                  SELECT id FROM t 
                  WHERE user_id = NEW.user_id 
                  ORDER BY id DESC
                  OFFSET 9);
    RETURN NEW;
END;
$$
LANGUAGE plpgsql;

Corresponding insertion trigger:

CREATE TRIGGER t_insertion_trigger
AFTER INSERT ON t
FOR EACH ROW
EXECUTE PROCEDURE update_t_history();

The error

The trigger works well, but when I run a few dozen insertions in a single transaction, I get the following error:

BEGIN
ERROR:  duplicate key value violates unique constraint "t_history_pkey"
DETAIL:  Key (id)=(196) already exists.

Updates

The id field in both tables (from \d+ t):
- id|bigint|not null default nextval('t_id_seq'::regclass)
- "t_pkey" PRIMARY KEY, btree (id)
PostgreSQL version is 9.3.

Any idea why the stored procedure breaks the primary key constraint in transactions?

Best Answer

Why is t_history.id auto-incremented in the first place? If "both t and t_history have the same schema", and t.id is a serial PK, you can just copy whole rows.

I would also suggest you only copy rows you actually delete from t to t_history - in a data-modifying CTE. This way you do not have overlapping rows (which might be part of the problem).

CREATE FUNCTION update_t_history()
  RETURNS trigger AS
$func$
BEGIN
   -- Keep the newest 10, move older rows to t_history
   WITH del AS (
      DELETE FROM t
      USING (
         SELECT id
         FROM   t 
         WHERE  user_id = NEW.user_id 
         ORDER  BY id DESC
         OFFSET 10      -- to keep 10 (not 9)
         FOR UPDATE     -- avoid race condition
         ) d
      WHERE t.id = d.id
      RETURNING t.*
      )
   INSERT INTO t_history 
   SELECT * FROM del;   -- copy whole row

   RETURN NULL;         -- irrelevant in AFTER trigger
END
$func$  LANGUAGE plpgsql;

The new row is already visible in an AFTER trigger.

Related Solutions

PostgreSQL – insert/update violates foreign key constraints

There are a few problems with your tables. I'll try to address the foreign keys first, since you question asked about them :)

But before that, we should realize that the two sets of tables (the first three you created and the second set, which you created after dropping the first set) are the same. Of course, the definition of Table3 in your second attempt has syntax and logical errors, but the basic idea is:

CREATE TABLE table3 (   
  "ID" bigint NOT NULL DEFAULT '0',   
  "DataID" bigint DEFAULT NULL,   
  "Address" numeric(20) DEFAULT NULL,   
  "Data" bigint DEFAULT NULL,
   PRIMARY KEY ("ID"),   
   FOREIGN KEY ("DataID") REFERENCES Table1("DataID") on delete cascade on update cascade,   
   FOREIGN KEY ("Address") REFERENCES Table2("Address") on delete cascade on update cascade
);

This definition tell PostgreSQL roughly the following: "Create a table with four columns, one will be the primary key (PK), the others can be NULL. If a new row is inserted, check DataID and Address: if they contain a non-NULL value (say 27856), then check Table1 for DataID˙and Table2 for Address. If there is no such value in those tables, then return an error." This last point which you've seen first:

ERROR: insert or update on table "Table3" violates foreign key constraint 
    "Table3_DataID_fkey" DETAIL: Key (DataID)=(27856) is not present in table "Table1".

So simple: if there is no row in Table1 where DataID = 27856, then you can't insert that row into Table3.

If you need that row, you should first insert a row into Table1 with DataID = 27856, and only then try to insert into Table3. If this seems to you not what you want, please describe in a few sentences what you want to achieve, and we can help with a good design.

And now about the other problems.

You define your PKs as

CREATE all_your_tables (
    first_column NOT NULL DEFAULT '0',   
    [...]
    PRIMARY KEY ("ID"),

A primary key means that all the items in it are different from each other, that is, the values are UNIQUE. If you give a static DEFAULT (like '0') to a UNIQUE column, you will experience bad surprises all the time. This is what you got in your third error message.

Furthermore, '0' means a text string, but not a number (bigint or numeric in your case). Use simply 0 instead (or don't use it at all, as I written above).

And a last point (I may be wrong here): in Table2, your Address field is set to numeric(20). At the same time, it is the PK of the table. The column name and the data type suggests that this address can change in the future. If this is true, than it is a very bad choice for a PK. Think about the following scenario: you have an address '1234567890454', which has a child in Table3 like

ID        DataID           Address             Data
123       3216547          1234567890454       654897564134569

Now that address happens to change to something other. How do you make your child row in Table3 follow its parent to the new address? (There are solutions for this, but can cause much confusion.) If this is your case, add an ID column to your table, which will not contain any information from the real world, it will simply serve as an identification value (that is, ID) for an address.

PostgreSQL Triggers – How to Prevent a Trigger from Being Fired by Another Trigger

^{(Obvious error in the trigger logic aside.)}
In Postgres 9.2 or later, use the function pg_trigger_depth() that @Akash already mentioned in a condition on the trigger itself (instead of the body of the trigger function), so that the trigger function is not even executed when called from another trigger (including itself - so also preventing loops).
This typically performs better and is simpler and cleaner:

CREATE TRIGGER set_history
BEFORE UPDATE ON field_data
FOR EACH ROW 
WHEN (pg_trigger_depth() < 1)
EXECUTE PROCEDURE gener_history();

The expression pg_trigger_depth() < 1 is evaluated before the trigger function is entered. So it evaluates to 0 in the first call. When called from another trigger, the value is higher and the trigger function is not executed.