Postgresql – How to optimize a table for a very high-frequency updates

postgresqlupdatevacuum

I have a table that contains a list of tasks that need to be run periodically:

applaudience=> \d+ maintenance_task
                                                                       Table "public.maintenance_task"
               Column               |           Type           | Collation | Nullable |                   Default                    | Storage  | Stats target | Description
------------------------------------+--------------------------+-----------+----------+----------------------------------------------+----------+--------------+-------------
 id                                 | integer                  |           | not null | nextval('maintenance_task_id_seq'::regclass) | plain    |              |
 nid                                | citext                   |           | not null |                                              | extended |              |
 execution_interval                 | interval                 |           | not null |                                              | plain    |              |
 last_attempted_at                  | timestamp with time zone |           |          | now()                                        | plain    |              |
 last_maintenance_task_execution_id | integer                  |           |          |                                              | plain    |              |
 disabled_at                        | timestamp with time zone |           |          |                                              | plain    |              |
 maximum_execution_duration         | interval                 |           | not null | '00:05:00'::interval                         | plain    |              |
 maximum_concurrent_execution_count | integer                  |           | not null | 0                                            | plain    |              |
 last_exhausted_at                  | timestamp with time zone |           | not null | now()                                        | plain    |              |
Indexes:
    "maintenance_task_pkey" PRIMARY KEY, btree (id)
    "maintenance_task_name_idx" UNIQUE, btree (nid)
Foreign-key constraints:
    "maintenance_task_last_maintenance_task_execution_id_fkey" FOREIGN KEY (last_maintenance_task_execution_id) REFERENCES maintenance_task_execution(id) ON DELETE SET NULL
Referenced by:
    TABLE "maintenance_task_execution" CONSTRAINT "maintenance_task_execution_maintenance_task_id_fkey" FOREIGN KEY (maintenance_task_id) REFERENCES maintenance_task(id) ON DELETE CASCADE
Options: autovacuum_vacuum_threshold=0, autovacuum_analyze_threshold=0, fillfactor=50

Every time a task is selected to be executed, we update the value of last_attempted_at. The following query is used to schedule new tasks:

CREATE OR REPLACE FUNCTION schedule_maintenance_task()
RETURNS table(maintenance_task_id int)
AS $$
BEGIN
  RETURN QUERY
  EXECUTE $q$
    UPDATE maintenance_task
    SET last_attempted_at = now()
    WHERE
      id = (
        WITH
          active_maintenance_task_execution_count AS (
            SELECT DISTINCT ON (maintenance_task_id)
              maintenance_task_id,
              execution_count
            FROM (
              SELECT
                id maintenance_task_id,
                0 execution_count
              FROM maintenance_task
              UNION
              SELECT
                mte1.maintenance_task_id,
                count(*) execution_count
              FROM maintenance_task_execution mte1
              WHERE
                mte1.ended_at IS NULL
              GROUP BY mte1.maintenance_task_id
            ) AS t
            ORDER BY
              maintenance_task_id,
              execution_count DESC
          )
        SELECT mt1.id
        FROM maintenance_task mt1
        INNER JOIN active_maintenance_task_execution_count amtec1 ON amtec1.maintenance_task_id = mt1.id
        WHERE
          mt1.disabled_at IS NULL AND
          mt1.maximum_concurrent_execution_count >= amtec1.execution_count AND
          (
            mt1.last_attempted_at < now() - mt1.execution_interval OR
            mt1.last_exhausted_at < now() - mt1.execution_interval
          )
        ORDER BY
          mt1.last_attempted_at ASC
        LIMIT 1
        FOR UPDATE OF mt1 SKIP LOCKED
      )
    RETURNING id
  $q$;
END
$$
LANGUAGE plpgsql
SET work_mem='50MB';

schedule_maintenance_task query is being run at approximately 600/ minute rate.

The problems start to occur after about 24 hours:

applaudience=> EXPLAIN (analyze, buffers)
applaudience-> SELECT id
applaudience-> FROM maintenance_task;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Seq Scan on maintenance_task  (cost=0.00..7715.86 rows=286886 width=4) (actual time=3.675..385.042 rows=31 loops=1)
   Buffers: shared hit=9455
 Planning time: 0.236 ms
 Execution time: 385.060 ms
(4 rows)

applaudience=> SELECT *
applaudience-> FROM pg_stat_all_tables
applaudience-> WHERE schemaname = 'public' AND relname = 'maintenance_task';
  relid   | schemaname |     relname      | seq_scan | seq_tup_read | idx_scan | idx_tup_fetch | n_tup_ins | n_tup_upd | n_tup_del | n_tup_hot_upd | n_live_tup | n_dead_tup | n_mod_since_analyze | last_vacuum |        last_autovacuum        | last_analyze |       last_autoanalyze        | vacuum_count | autovacuum_count | analyze_count | autoanalyze_count
----------+------------+------------------+----------+--------------+----------+---------------+-----------+-----------+-----------+---------------+------------+------------+---------------------+-------------+-------------------------------+--------------+-------------------------------+--------------+------------------+---------------+-------------------
 22903432 | public     | maintenance_task |   163230 |      5060130 |  5571795 |       7988441 |         0 |    185359 |         0 |        172989 |     148568 |     138285 |                9733 |             | 2018-12-09 11:00:33.978177+00 |              | 2018-12-09 10:01:07.945327+00 |            0 |             6922 |             0 |              1416
(1 row)

The number of dead tuples grows to 100k+. A simple seq scan needs to read 9k+ buffers to fetch 31 rows.

Here is a VACUUM VERBOSE maintenance_task log:

INFO:  vacuuming "public.maintenance_task"
INFO:  index "maintenance_task_pkey" now contains 9555 row versions in 331 pages
DETAIL:  0 index row versions were removed.
282 index pages have been deleted, 282 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  index "maintenance_task_name_idx" now contains 9555 row versions in 787 pages
DETAIL:  0 index row versions were removed.
690 index pages have been deleted, 690 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "maintenance_task": found 0 removable, 145247 nonremovable row versions in 2459 out of 4847 pages
DETAIL:  145217 dead row versions cannot be removed yet, oldest xmin: 928967630
There were 180 unused item pointers.
Skipped 1 page due to buffer pins, 2387 frozen pages.
0 pages are entirely empty.
CPU: user: 0.05 s, system: 0.00 s, elapsed: 0.34 s.
INFO:  vacuuming "pg_toast.pg_toast_22903432"
INFO:  index "pg_toast_22903432_index" now contains 0 row versions in 1 pages
DETAIL:  0 index row versions were removed.
0 index pages have been deleted, 0 are currently reusable.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
INFO:  "pg_toast_22903432": found 0 removable, 0 nonremovable row versions in 0 out of 0 pages
DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 928967630
There were 0 unused item pointers.
Skipped 0 pages due to buffer pins, 0 frozen pages.
0 pages are entirely empty.
CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
VACUUM

What can be done to prevent the growing number of the dead tuples/ slowing down of the scheduling query?

Best Answer

The setting old_snapshot_threshold was added in Postgres 9.6 for cases like this. A long-open query will not be allowed to hold back the vacuum indefinitely. If the query never needs data that might have been vacuumed away, it will complete as normal. If finds it does need such data, it will throw an error. And if it is part of a forgotten-about connection, it will hang around indefinitely but not cause table bloat while doing so.

Note, however, that this setting prevents autovacuum from returning freed space at the end of relations to the operating system, since that is needed to detect the error condition. Only a manual VACUUM FULL will still force it. So while it can help to fight one type of table bloat (exactly your problem), it can lead to another kind (often less critical).

Alternatively consider idle_in_transaction_session_timeout, which terminates sessions being idle for too long. Chose your settings wisely.