PostgreSQL Statistics – Where Does Postgres Store All Statistics?

postgresqlstatistics

I recently ran into a weird problem while migrating from PG 9.4 => 11

I replicated the production server from PG 9.4 to PG 11 using pglogical. Everything worked out well, except we discovered that a few complex queries were taking tens of seconds on the PG 11 server, while there were taking under one second on the 9.4 server.

It turns out that running ANALYZE on the new server fixed the slow queries. So, I checked $PGDATA/pg_stat and $PGDATA/pg_stat_tmp and found them to be empty on both the servers!

Where does PG store statistics? (Follow-up question — doesn't pglogical collect statistics while it is replicating?)

Edit / follow-up:

Output of query on pg_stat_user_tables on the new PG 11 server –

select analyze_count,autoanalyze_count, count(*) from pg_stat_user_tables group by 1,2;
 analyze_count | autoanalyze_count | count 
---------------+-------------------+-------
             1 |                11 |     1
             4 |                 7 |     1
             1 |                 0 |    14
             4 |                 0 |   168
(4 rows)

Same query on old PG 9.4 server (current production server):

 analyze_count | autoanalyze_count | count 
---------------+-------------------+-------
             0 |                13 |     3
             0 |               226 |     1
             0 |                 1 |    31
             0 |                 5 |     6
             0 |                21 |     2
             0 |                16 |     2
             0 |               601 |     1
             0 |                 8 |     1
             0 |                20 |     1
             0 |                 4 |     2
             0 |                 9 |     2
             0 |                10 |     2
             0 |                12 |     1
             0 |                 7 |     2
             0 |                41 |     1
             0 |                 2 |    11
             0 |                 6 |     1
             0 |                 3 |     8
             0 |               731 |     1
             0 |                39 |     1
             0 |                11 |     1
             0 |                 0 |    99
             0 |                50 |     1
             0 |                66 |     1
             0 |               241 |     1
             0 |                14 |     2

Best Answer

There are two different kinds of statistics, and you are confusing them. One is about the distributions of values in the table columns and things like that. These are held in the table pg_catalog.pg_statistic, and the more-user-friendly view pg_stats. This is a more-or-less normal database table. These are gathered by ANALYZE and/or VACUUM, or their "auto" equivalents.

The other kind of statistics records how many updates, inserts, etc. have occured on each table. On a live server, you can see this data using the views pg_stat_user_tables and friends, but there are no real tables underlying these views. These statistics affect the first kind of statistics, because they are used to decide how often ANALYZE and VACUUM are auto-run. These are stored outside the normal database tables, in "pg_stats_tmp" (or whatever "stats_temp_directory" is set to if it is not the default) when the server is running, and moved to pg_stat for cold storage while the server is shut down. Upon a crash, all this data is discarded. If both of these directories are empty, then perhaps you have turned "track_counts" off, or perhaps you have set "stats_temp_directory" to a non-default value and so are looking in the wrong place. Or maybe there was a crash recently and they have not been rebuilt yet.

Follow-up question -- doesn't pglogical collect statistics while it is replicating?

If you have "track_counts" or "autovacuum" turned off, that would explain the lack of recent stats. Or, if you populated the new database extremely quickly, then maybe the automatic analyze just didn't have a chance to do its things yet. If none of those are the case, then you would have to consider whether there are no stats (of the pg_stats variety) at all, or if they existed but were just not fresh enough. Running this query might help you figure that out:

select analyze_count,autoanalyze_count from pg_stat_user_tables

But there is a pretty good chance you will never be able to reach a definitive conclusion, unless you are willing to recreate the situation from scratch and poke around in it.