Postgresql – Plpgsql seems to be deleting and inserting instead of updating – Why

plpgsqlpostgresql

I am using postgreSQL 7.4.

I have a large table , call it table_a:

key1 INT NOT NULL, 
key2 INT NOT NULL, 
data INT NOT NULL, 
itstamp INT NOT NULL DEFAULT (date_part('EPOCH'::text, (timeofday())::timestamp without time zone))::INTEGER

and a table that summaries the last update time for key1, call it table_b:

key1        INT NOT NULL,
max_itstamp INT NOT NULL

I created a trigger function in plpgsql to update or insert rows in table_b as necessary:

CREATE OR REPLACE FUNCTION table_b_update() RETURNS TRIGGER AS '
 DECLARE
  l_key1 INT;
  l_itstamp INT;
 BEGIN
  l_key1 := new.key1;
  l_itstamp := new.itstamp;
  PERFORM TRUE FROM table_b WHERE key1=l_key1;
  IF NOT FOUND THEN 
   INSERT INTO table_b(key1, max_itstamp) values (l_key1, l_itstamp);
  ELSE
   UPDATE table_b SET max_itstamp=l_itstamp WHERE key1=l_key1;
  END IF;
  RETURN NULL;
 END'
LANGUAGE plpgsql IMMUTABLE;

and then I attached a trigger to table_a:

CREATE TRIGGER table_a_trigger1 AFTER INSERT OR UPDATE ON table_a FOR EACH ROW
EXECUTE PROCEDURE table_b_upate();

Now, the time to insert new data into table_a grows incrementally. The file footprint of table_b grows steadily.

I have used RAISE NOTICE commands in the function to confirm that the If statement causes an UPDATE and not an INSERT after the first call per key.

Since the table_a insert time grows for each INSERT, I tried a VACUUM FULL on table_b. The table_a insert time was reduced considerably. The file size for table_b was reduced considerably. After the VACUUM FULL the table_a insert time started to grow again. I don't want to do a VACUUM FULL after every INSERT into table_a though.

Is it possible that the UPDATE is actually doing a DELETE and INSERT in table_b?

Best Answer

I don't have 7.4 to test on, but I'm guessing:

  • every time you do a vacuum full the table compacts
  • every time you update, the new version of the row (see MVCC) gets shoved at the end of the heap before the old one is removed by a vacuum

See here for the docs explaining this in more detail, but the simple solution is not to run vacuum full at all - just vacuum. Then your table will probably settle into a steady state where 'holes' in the data are left and can be used by later updates.

As for "insert time", I'm surprised at your results. My expectation would be that insert time would be slower after a vacuum full - but if all the blocks are in the cache, the overhead of finding free space inside the current block might be higher than adding the new row at the end of the heap even if the number of blocks accessed is higher