I have a table that contains a list of tasks that need to be run periodically:
applaudience=> \d+ maintenance_task
Table "public.maintenance_task"
Column | Type | Collation | Nullable | Default | Storage | Stats target | Description
------------------------------------+--------------------------+-----------+----------+----------------------------------------------+----------+--------------+-------------
id | integer | | not null | nextval('maintenance_task_id_seq'::regclass) | plain | |
nid | citext | | not null | | extended | |
execution_interval | interval | | not null | | plain | |
last_attempted_at | timestamp with time zone | | | now() | plain | |
last_maintenance_task_execution_id | integer | | | | plain | |
disabled_at | timestamp with time zone | | | | plain | |
maximum_execution_duration | interval | | not null | '00:05:00'::interval | plain | |
maximum_concurrent_execution_count | integer | | not null | 0 | plain | |
last_exhausted_at | timestamp with time zone | | not null | now() | plain | |
Indexes:
"maintenance_task_pkey" PRIMARY KEY, btree (id)
"maintenance_task_name_idx" UNIQUE, btree (nid)
Foreign-key constraints:
"maintenance_task_last_maintenance_task_execution_id_fkey" FOREIGN KEY (last_maintenance_task_execution_id) REFERENCES maintenance_task_execution(id) ON DELETE SET NULL
Referenced by:
TABLE "maintenance_task_execution" CONSTRAINT "maintenance_task_execution_maintenance_task_id_fkey" FOREIGN KEY (maintenance_task_id) REFERENCES maintenance_task(id) ON DELETE CASCADE
Options: autovacuum_vacuum_threshold=0, autovacuum_analyze_threshold=0, fillfactor=50
Every time a task is selected to be executed, we update the value of last_attempted_at
. The following query is used to schedule new tasks:
CREATE OR REPLACE FUNCTION schedule_maintenance_task()
RETURNS table(maintenance_task_id int)
AS $$
BEGIN
RETURN QUERY
EXECUTE $q$
UPDATE maintenance_task
SET last_attempted_at = now()
WHERE
id = (
WITH
active_maintenance_task_execution_count AS (
SELECT DISTINCT ON (maintenance_task_id)
maintenance_task_id,
execution_count
FROM (
SELECT
id maintenance_task_id,
0 execution_count
FROM maintenance_task
UNION
SELECT
mte1.maintenance_task_id,
count(*) execution_count
FROM maintenance_task_execution mte1
WHERE
mte1.ended_at IS NULL
GROUP BY mte1.maintenance_task_id
) AS t
ORDER BY
maintenance_task_id,
execution_count DESC
)
SELECT mt1.id
FROM maintenance_task mt1
INNER JOIN active_maintenance_task_execution_count amtec1 ON amtec1.maintenance_task_id = mt1.id
WHERE
mt1.disabled_at IS NULL AND
mt1.maximum_concurrent_execution_count >= amtec1.execution_count AND
(
mt1.last_attempted_at < now() - mt1.execution_interval OR
mt1.last_exhausted_at < now() - mt1.execution_interval
)
ORDER BY
mt1.last_attempted_at ASC
LIMIT 1
FOR UPDATE OF mt1 SKIP LOCKED
)
RETURNING id
$q$;
END
$$
LANGUAGE plpgsql
SET work_mem='50MB';
schedule_maintenance_task
query is being run at approximately 600/ minute rate.
The problems start to occur after about 24 hours:
applaudience=> EXPLAIN (analyze, buffers)
applaudience-> SELECT id
applaudience-> FROM maintenance_task;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Seq Scan on maintenance_task (cost=0.00..7715.86 rows=286886 width=4) (actual time=3.675..385.042 rows=31 loops=1)
Buffers: shared hit=9455
Planning time: 0.236 ms
Execution time: 385.060 ms
(4 rows)
applaudience=> SELECT *
applaudience-> FROM pg_stat_all_tables
applaudience-> WHERE schemaname = 'public' AND relname = 'maintenance_task';
relid | schemaname | relname | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum | last_autovacuum | last_analyze | last_autoanalyze | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
----------+------------+------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
22903432 | public | maintenance_task | 163230 | 5060130 | 5571795 | 7988441 | 0 | 185359 | 0 | 172989 | 148568 | 138285 | 9733 | | 2018-12-09 11:00:33.978177+00 | | 2018-12-09 10:01:07.945327+00 | 0 | 6922 | 0 | 1416
(1 row)
The number of dead tuples grows to 100k+. A simple seq scan needs to read 9k+ buffers to fetch 31 rows.
Here is a VACUUM VERBOSE maintenance_task
log:
INFO: vacuuming "public.maintenance_task"
INFO: index "maintenance_task_pkey" now contains 9555 row versions in 331 pages
DETAIL: 0 index row versions were removed.
282 index pages have been deleted, 282 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: index "maintenance_task_name_idx" now contains 9555 row versions in 787 pages
DETAIL: 0 index row versions were removed.
690 index pages have been deleted, 690 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "maintenance_task": found 0 removable, 145247 nonremovable row versions in 2459 out of 4847 pages
DETAIL: 145217 dead row versions cannot be removed yet, oldest xmin: 928967630
There were 180 unused item pointers.
Skipped 1 page due to buffer pins, 2387 frozen pages.
0 pages are entirely empty.
CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.34 s.
INFO: vacuuming "pg_toast.pg_toast_22903432"
INFO: index "pg_toast_22903432_index" now contains 0 row versions in 1 pages
DETAIL: 0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO: "pg_toast_22903432": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 928967630
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM
What can be done to prevent the growing number of the dead tuples/ slowing down of the scheduling query?
Best Answer
The setting
old_snapshot_threshold
was added in Postgres 9.6 for cases like this. A long-open query will not be allowed to hold back the vacuum indefinitely. If the query never needs data that might have been vacuumed away, it will complete as normal. If finds it does need such data, it will throw an error. And if it is part of a forgotten-about connection, it will hang around indefinitely but not cause table bloat while doing so.Note, however, that this setting prevents
autovacuum
from returning freed space at the end of relations to the operating system, since that is needed to detect the error condition. Only a manualVACUUM FULL
will still force it. So while it can help to fight one type of table bloat (exactly your problem), it can lead to another kind (often less critical).Alternatively consider
idle_in_transaction_session_timeout
, which terminates sessions being idle for too long. Chose your settings wisely.