Eelke is almost certainly correct that your locking is blocking autovacuum. Autovacuum is designed to give way to user activity, deliberately. If those tables are locked, autovacuum cannot vacuum them.
For posterity, however, I wanted to give an example set of settings for hyper-aggressive autovacuum, since the settings you gave don't quite do it. Note that making autovacuum more aggressive is unlikely to solve your problem, however. Also note that the default autovacuum settings are based on running over 200 test runs using DBT2 seeking an optimal combination of settings, so the defaults should be assumed to be good unless you have a solid reason to think otherwise, or unless your database is significantly outside the mainstream for OLTP databases (e.g. a tiny database which gets 10K updates per second, or a 3TB data warehouse).
First, turn on logging so you can check up on whether autovacuum is doing what you think it is:
log_autovacuum_min_duration = 0
Then let's make more autovac workers and have them check tables more often:
autovacuum_max_workers = 6
autovacuum_naptime = 15s
Let's lower the thresholds for auto-vacuum and auto-analyze to trigger sooner:
autovacuum_vacuum_threshold = 25
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_threshold = 10
autovacuum_analyze_scale_factor = 0.05
Then let's make autovacuum less interruptable, so it completes faster, but at the cost of having a greater impact on concurrent user activity:
autovacuum_vacuum_cost_delay = 10ms
autovacuum_vacuum_cost_limit = 1000
There's your full program for generically aggressive autovacuum, which might be apppropriate for a small database getting a very high rate of updates, but might have too great of an impact on concurrent user activity.
Also, note that autovacuum parameters can be adjusted per table, which is almost always a better answer for needing to adjust autovacuum's behavior.
Again, though, it's unlikely to address your real problem.
Assumptions
Since information is missing in the Q, I'll assume:
- Your data comes from a file on the database server.
- The data is formatted just like
COPY
output, with a unique id
per row to match the the target table.
If not, format it properly first or use COPY
options to deal with the format.
- You are updating every single row in the target table or most of them.
- You can afford to drop and recreate the target table.
That means no concurrent access. Else consider this related answer:
- There are no depending objects at all, except for indices.
Solution
I suggest you go with a similar approach as outlined at the link from your third bullet. With major optimizations.
To create the temporary table, there is a simpler and faster way:
CREATE TEMP TABLE tmp_tbl AS SELECT * FROM tbl LIMIT 0;
A single big UPDATE
from a temporary table inside the database will be faster than individual updates from outside the database by several orders of magnitude.
In PostgreSQL's MVCC model, an UPDATE
means to create a new row version and mark the old one as deleted. That's about as expensive as an INSERT
and a DELETE
combined. Plus, it leaves you with a lot of dead tuples. Since you are updating the whole table anyway, it would be faster overall to just create a new table and drop the old one.
If you have enough RAM available, set temp_buffers
(only for this session!) high enough to hold the temp table in RAM - before you do anything else.
To get an estimate how much RAM is needed, run a test with a small sample and use db object size functions:
SELECT pg_size_pretty(pg_relation_size('tmp_tbl')); -- complete size of table
SELECT pg_column_size(t) FROM tmp_tbl t LIMIT 10; -- size of sample rows
Complete script
SET temp_buffers = '1GB'; -- example value
CREATE TEMP TABLE tmp_tbl AS SELECT * FROM tbl LIMIT 0;
COPY tmp_tbl FROM '/absolute/path/to/file';
CREATE TABLE tbl_new AS
SELECT t.col1, t.col2, u.field1, u.field2
FROM tbl t
JOIN tmp_tbl u USING (id);
-- Create indexes like in original table
ALTER TABLE tbl_new ADD PRIMARY KEY ...;
CREATE INDEX ... ON tbl_new (...);
CREATE INDEX ... ON tbl_new (...);
-- exclusive lock on tbl for a very brief time window!
DROP TABLE tbl;
ALTER TABLE tbl_new RENAME TO tbl;
DROP TABLE tmp_tbl; -- will also be dropped at end of session automatically
Concurrent load
Concurrent operations on the table (which I ruled out in the assumptions at the start) will wait, once the table is locked near the end and fail as soon as the transaction is committed, because the table name is resolved to its OID immediately, but the new table has a different OID. The table stays consistent, but concurrent operations may get an exception and have to be repeated. Details in this related answer:
UPDATE route
If you (have to) go the UPDATE
route, drop any index that is not needed during the update and recreate it afterwards. It is much cheaper to create an index in one piece than to update it for every individual row. This may also allow for HOT updates.
I outlined a similar procedure using UPDATE
in this closely related answer on SO.
Best Answer
Yes, this will modify all rows, even if the value doesn't change.
If you want to avoid that, either use a
WHERE
condition:or use the
suppress_redundant_updates_trigger()
trigger function to define aBEFORE UPDATE
trigger on the table.Avoiding the unnecessary updates will improve the performance and reduce table bloat.