I use PostgreSQL 9.6.
I have a query that joins 17 tables, 9 of those having several million rows. The query was running fine but its performance degraded rapidly this week. EXPLAIN's output didn't help (all scans are index scans except for the very small tables) and I had to try and remove tables from the query to isolate one that caused the degradation.
It turns out that an unremarkable table containing 40 rows broke the query: 800 ms without the table vs 30 s with it. I ran VACUUM FULL on the table, which ran in about a second and now the performance is back to normal.
My questions:
- What can explain that a <10kb table breaks performance like this?
- How to avoid the same problem in the future?
In the process of debugging, I made a base backup to another server, so I have two filesystem-level copies of the DB, one of them where I didn't run VACUUM FULL. When I logged on to the unvacuumed copy with pgAdmin, I got the following message:
The estimated rowcount on the table "public.clients" deviates significantly from the actual rowcount. You should run VACUUM ANALYZE on this table.
The unvacuumed table has 40 rows counted and 0 estimated. Here are the rest of its statistics in a screenshot.
Best Answer
The table may be small, but as long as Postgres expects roughly 0 rows, chances are it is going to chose a different query plan than for roughly 40 rows - for which the same query plan is not as efficient.
Since joins multiply result rows rather than just adding to them, the 40 rows in the tiny table can have a massive effect when joined to big tables with several million rows like in your example. The difference can easily explain a factor 30 in execution time.
Or as the manual puts it:
Default
autovacuum
settings are ok for most installations. Consider:But for a database holding multiple tables with several millions rows, I would consider tuning per-table settings for selected tables and a manual
ANALYZE
on the whole DB from time to time.The remaining questions
Q1. Why did autovacuum not launch
ANALYZE
automatically?Q2. Why did
VACUUM FULL
fix the problem?Q2 is simple: While other important statistics are only updated by
ANALYZE
, the basic count estimate inpg_class.reltuples
is updated more often. The manual:Q1 is more sophisticated.
The manual again:
Relevant settings (among others):
Bold emphasis mine.
Demo
Be sure the test DB is mostly idle to avoid testing artifacts and you are running with default settings:
Most importantly:
Basically, autovacuum checks once every minute whether any table has last_estimate / 100 + 50 rows changed and launches
ANALYZE
for those.To understand what happened in your case:
pg_class.reltuples
is the estimated row count of the table. More here:You'll get
0
. Wait for 2 minutes to make sure we cross the 1 minute delay. Check again. Still0
. Now insert one more row and check again:Still
0
. Wait another 2 minutes, check again. Tada! We see the updated count of51
. Autovacuum did not kick in until 51 rows were inserted (or updated or deleted).To see more details (including the timestamp of the
last_autoanalyze
):Related:
Solution
Run
ANALYZE
onpublic.clients
manually once (or on the whole DB, it's cheap) and use more aggressive per-table autovacuum settings for this important table. Like:You might also want to audit the settings for some of your big tables for other reasons. Compare:
Also important
You are joining 17 tables, which is well beyond the default setting for
join_collapse_limit
which is 8. You may want to use explicit join syntax (maybe you already do) and rewrite your query to place the most selective tables (or the ones with the most selective predicates) first in theSELECT
list. Related:Complex view becomes slow when adding ORDER BY in outer query with small LIMIT
Can we execute an optimal plan instead of a generic one in the first execution of a PL/pgSQL function?
P.S.: I think I found a minor documentation bug while running the test above. The manual on
autovacuum_analyze_threshold
reads:This would indicate that 50 inserts trigger
ANALYZE
, not 51 like I observed. Similar inpg_settings.short_desc
:In fact, the explanation for autovacuum in the manual here matches my observation:
The first two sentences seem slightly incorrect.
I filed a bug report.