I am using postgreSQL 7.4.
I have a large table , call it table_a:
key1 INT NOT NULL,
key2 INT NOT NULL,
data INT NOT NULL,
itstamp INT NOT NULL DEFAULT (date_part('EPOCH'::text, (timeofday())::timestamp without time zone))::INTEGER
and a table that summaries the last update time for key1, call it table_b:
key1 INT NOT NULL,
max_itstamp INT NOT NULL
I created a trigger function in plpgsql to update or insert rows in table_b as necessary:
CREATE OR REPLACE FUNCTION table_b_update() RETURNS TRIGGER AS '
DECLARE
l_key1 INT;
l_itstamp INT;
BEGIN
l_key1 := new.key1;
l_itstamp := new.itstamp;
PERFORM TRUE FROM table_b WHERE key1=l_key1;
IF NOT FOUND THEN
INSERT INTO table_b(key1, max_itstamp) values (l_key1, l_itstamp);
ELSE
UPDATE table_b SET max_itstamp=l_itstamp WHERE key1=l_key1;
END IF;
RETURN NULL;
END'
LANGUAGE plpgsql IMMUTABLE;
and then I attached a trigger to table_a:
CREATE TRIGGER table_a_trigger1 AFTER INSERT OR UPDATE ON table_a FOR EACH ROW
EXECUTE PROCEDURE table_b_upate();
Now, the time to insert new data into table_a grows incrementally. The file footprint of table_b grows steadily.
I have used RAISE NOTICE commands in the function to confirm that the If statement causes an UPDATE and not an INSERT after the first call per key.
Since the table_a insert time grows for each INSERT, I tried a VACUUM FULL on table_b. The table_a insert time was reduced considerably. The file size for table_b was reduced considerably. After the VACUUM FULL the table_a insert time started to grow again. I don't want to do a VACUUM FULL after every INSERT into table_a though.
Is it possible that the UPDATE is actually doing a DELETE and INSERT in table_b?
Best Answer
I don't have 7.4 to test on, but I'm guessing:
vacuum full
the table compactsupdate
, the new version of the row (see MVCC) gets shoved at the end of the heap before the old one is removed by avacuum
See here for the docs explaining this in more detail, but the simple solution is not to run
vacuum full
at all - justvacuum
. Then your table will probably settle into a steady state where 'holes' in the data are left and can be used by later updates.As for "insert time", I'm surprised at your results. My expectation would be that
insert
time would be slower after avacuum full
- but if all the blocks are in the cache, the overhead of finding free space inside the current block might be higher than adding the new row at the end of the heap even if the number of blocks accessed is higher