Normally we'd expect that when postgres was restarted, the crash recovery process would have removed files related to a rollback'ed index from the data directory.
Let's assume that it didn't work, or at least that it has to be checked manually.
The list of files that should be in the datadir can be established with a query like this:
select pg_relation_filenode(oid)
from pg_class
where relkind in ('i','r','t','S','m')
and reltablespace=0
order by 1;
reltablespace=0
is for the default tablespace. If the problematic index was created in a non-default tablespace, this 0
must be replaced by its OID in pg_tablespace
.
i,r,t,S,m in relkind
correspond respectively to indexes, tables, toast space, sequences, materialized views. All these objects have their data in files whose names match pg_relation_filenode(oid)
.
On disk, the data files are below $PGDATA/base/oid/
where oid
is the oid
of the database obtained by select oid,datname from pg_database
.
If we're not talking about the default tablespace, base
is replaced by PG_version_somelabel
instead.
List and sort the files matching relfilenodes in that directory:
ls | grep -E '^[0-9]+$' | sort -n > /tmp/list-of-relations.txt
(that actually keeps only the first segment for relations that are larger than 1Gb. If there are lingering segments not attached to anything they should be considered separately)
and diff that file with the result of the query above.
If there are lingering data files that do not correspond to any object that the db knows about, they should appear in that diff.
The table may be small, but as long as Postgres expects roughly 0 rows, chances are it is going to chose a different query plan than for roughly 40 rows - for which the same query plan is not as efficient.
Since joins multiply result rows rather than just adding to them, the 40 rows in the tiny table can have a massive effect when joined to big tables with several million rows like in your example. The difference can easily explain a factor 30 in execution time.
Or as the manual puts it:
It is important to have reasonably accurate statistics, otherwise poor
choices of plans might degrade database performance.
Default autovacuum
settings are ok for most installations. Consider:
But for a database holding multiple tables with several millions rows, I would consider tuning per-table settings for selected tables and a manual ANALYZE
on the whole DB from time to time.
The remaining questions
Q1. Why did autovacuum not launch ANALYZE
automatically?
Q2. Why did VACUUM FULL
fix the problem?
Q2 is simple: While other important statistics are only updated by ANALYZE
, the basic count estimate in pg_class.reltuples
is updated more often. The manual:
Number of rows in the table. This is only an estimate used by the
planner. It is updated by VACUUM
, ANALYZE
, and a few DDL commands such
as CREATE INDEX
.
Q1 is more sophisticated.
The manual again:
The daemon schedules ANALYZE
strictly as a function of the number of
rows inserted or updated; it has no knowledge of whether that will
lead to meaningful statistical changes.
Relevant settings (among others):
autovacuum_analyze_threshold
(integer
)
Specifies the minimum number of inserted, updated or deleted tuples
needed to trigger an ANALYZE
in any one table. The default is 50
tuples. This parameter can only be set in the postgresql.conf
file or
on the server command line; but the setting can be overridden for
individual tables by changing table storage parameters.
autovacuum_analyze_scale_factor
(floating point
)
Specifies a fraction of the table size to add to
autovacuum_analyze_threshold
when deciding whether to trigger an
ANALYZE
. The default is 0.1 (10% of table size). This parameter can
only be set in the postgresql.conf file or on the server command line;
but the setting can be overridden for individual tables by changing
table storage parameters.
Bold emphasis mine.
Demo
Be sure the test DB is mostly idle to avoid testing artifacts and you are running with default settings:
SELECT * FROM pg_settings WHERE name ~ '^autovacuum|track_counts';
Most importantly:
autovacuum_analyze_scale_factor = 0.1
autovacuum_analyze_threshold = 50
autovacuum_naptime = 60
track_counts = on
Basically, autovacuum checks once every minute whether any table has last_estimate / 100 + 50 rows changed and launches ANALYZE
for those.
To understand what happened in your case:
CREATE TABLE t50 (id int primary key, foo text);
INSERT INTO t50 SELECT g, 'txt' || g FROM generate_series(1,50) g;
SELECT reltuples FROM pg_class WHERE oid = 't50'::regclass;
pg_class.reltuples
is the estimated row count of the table. More here:
You'll get 0
. Wait for 2 minutes to make sure we cross the 1 minute delay. Check again. Still 0
. Now insert one more row and check again:
INSERT INTO t50 VALUES (51, 'txt51 triggers analyze');
SELECT reltuples FROM pg_class WHERE oid = 't50'::regclass;
Still 0
. Wait another 2 minutes, check again. Tada! We see the updated count of 51
. Autovacuum did not kick in until 51 rows were inserted (or updated or deleted).
To see more details (including the timestamp of the last_autoanalyze
):
SELECT * FROM pg_stat_all_tables WHERE relid = 't50'::regclass;
Related:
Solution
Run ANALYZE
on public.clients
manually once (or on the whole DB, it's cheap) and use more aggressive per-table autovacuum settings for this important table. Like:
ALTER TABLE public.clients SET (autovacuum_analyze_scale_factor = 0.01
, autovacuum_analyze_threshold = 10);
You might also want to audit the settings for some of your big tables for other reasons. Compare:
Also important
You are joining 17 tables, which is well beyond the default setting for join_collapse_limit
which is 8. You may want to use explicit join syntax (maybe you already do) and rewrite your query to place the most selective tables (or the ones with the most selective predicates) first in the SELECT
list. Related:
P.S.:
I think I found a minor documentation bug while running the test above. The manual on autovacuum_analyze_threshold
reads:
Specifies the minimum number of inserted, updated or deleted tuples
needed to trigger an ANALYZE
in any one table
This would indicate that 50 inserts trigger ANALYZE
, not 51 like I observed. Similar in pg_settings.short_desc
:
Minimum number of tuple inserts, updates, or deletes prior to analyze.
In fact, the explanation for autovacuum in the manual here matches my observation:
Otherwise, if the number of tuples obsoleted since the last VACUUM
exceeds the "vacuum threshold", the table is vacuumed.
The first two sentences seem slightly incorrect.
I filed a bug report.
Best Answer
You can see the progress of vacuum with
select * from pg_stat_progress_vacuum;
Here are some tips for vacuuming large tables
https://medium.com/coding-blocks/optimizing-storage-and-managing-cleanup-in-postgresql-c2fe56d4cf5
You do not have to wait too long for autovacuuming