I don't have 7.4 to test on, but I'm guessing:
- every time you do a
vacuum full
the table compacts
- every time you
update
, the new version of the row (see MVCC) gets shoved at the end of the heap before the old one is removed by a vacuum
See here for the docs explaining this in more detail, but the simple solution is not to run vacuum full
at all - just vacuum
. Then your table will probably settle into a steady state where 'holes' in the data are left and can be used by later updates.
As for "insert time", I'm surprised at your results. My expectation would be that insert
time would be slower after a vacuum full
- but if all the blocks are in the cache, the overhead of finding free space inside the current block might be higher than adding the new row at the end of the heap even if the number of blocks accessed is higher
Basically we would like to create a TRIGGER for each table we want to be notified for an UPDATE/INSERT/DELETE operation. Once this trigger fires it will execute a function that will simply append a new row (encoding the event) to a log table that we will then poll from an external service.
That's a pretty standard use for a trigger.
Before going all in with Postgres TRIGGER(s) we would like to know how they scale: how many triggers can we create on a single Postgres installation?
If you keep creating them, eventually you'll run out of disk space.
There's no specific limit for triggers.
PostgreSQL limits are documented on the about page.
Does they affect query performance?
It depends on the trigger type, trigger language, and what the trigger does.
A simple PL/PgSQL BEFORE ... FOR EACH STATEMENT
trigger that doesn't do anything has near-zero overhead.
FOR EACH ROW
triggers have higher overhead than FOR EACH STATEMENT
triggers. Scaling, obviously, with the affected row counts.
AFTER
triggers are more expensive than BEFORE
triggers because they must be queued up until the statement finishes doing its work, then executed. They aren't spilled to disk if the queue gets big (at least in 9.4 and below, may change in future) so huge AFTER
trigger queues can cause available memory to overrun, resulting in the statement aborting.
A trigger that modifies the NEW
row before insert/update is cheaper than a trigger that does DML.
The specific use case you want would perform better with an in-progress enhancement that might make it into PostgreSQL 9.5 (if we're lucky), where FOR EACH STATEMENT
triggers can see virtual OLD
and NEW
tables. This isn't possible in current PostgreSQL versions, so you must use FOR EACH ROW
triggers instead.
Did anyone before tried this ?
Of course. It's a pretty standard use for triggers, along with auditing, sanity checking, etc.
You'll want to look into LISTEN
and NOTIFY
for a good way to wake up your worker when changes to the task table happen.
You're already doing the most important thing by avoiding talking to external systems directly from triggers. That tends to be problematic for performance and reliability. People often try to do things like send mail directly from a trigger, and that's bad news.
Best Answer
By default,
UPDATE table_a ... FROM table_b
buildsCROSS JOIN
of all rows from both tables. So suchUPDATE
applies to all rows oftable_a
.Solution is simple: add to
WHERE
conditionto limit rows of
CROSS JOIN
result to only just-now-updated rows.NEW
andOLD
are a systemrecord
which keep values of all columns of row for which trigger function has been called.