Postgresql – Do postgres autovacuum properties persist for DB replications

autovacuumpostgresqlreplication

I would like to understand postgres replication options more, and as I am doing my research, I have come across a concept of autovacuuming in efforts to create routine DB maintenance. My question is whether or not autovacuum settings persist across replications? My initial thought is no, because some of these parameters are set at a server level? But also there are table and DB specific autovacuum options. If I set certain autovacuum properties on a specific postgres table or DB, would these properties translate across replicas of the DBs? Thanks in advance for the help.

Best Answer

VACUUM (both manual and auto) do not run on a physical replica. The results of a VACUUM running on the master get replicated over, just like other changes do. Per-table vacuum settings are stored in database catalog tables, and so get replicated over, but they have no effect unless/until the replica gets promoted.

For logical replica, its VACUUM settings are completely separate from the publisher. Both the server-level and table-level settings are just whatever you set them to.

Related Solutions

Postgresql – Increasing work_mem and shared_buffers on Postgres 9.2 significantly slows down queries

First of all, keep in mind that work_mem is per operation and so it can get excessive pretty quickly. In general if you are not having trouble with sorts being slow I would leave work_mem alone until you need it.

Looking at your query plans, one thing that strikes me is that the buffer hits are very different looking at the two plans, and that even the sequential scans are slower. I suspect that the issue has to do with read-ahead caching and having less space for that. What this means is you are biasing memory for re-use of indexes and against reading tables on disk.

My understanding is that PostgreSQL will look to the cache for a page before reading it from disk because it doesn't know really whether the OS cache will contain that page. Because the pages are then staying in the cache and because that cache is slower than the OS cache, this changes the sorts of queries which are fast vs the sorts that are slow. In fact reading the plans, aside from work_mem issues, it looks like all of your query info comes from the cache but it is a question of which cache.

work_mem: how much memory we can allocate for a sort or related join operation. This is per operation, not per statement or per back-end, so a single complex query can use many times this amount of memory. It isn't clear you are hitting this limit but it is worth noting and being aware of. if you increase this too far, you lose memory that might be available for the read cache and the shared buffers.

shared_buffers: how much memory to allocate to the actual PostgreSQL page queue. Now, ideally the interesting set of your database will stay in memory cached here and in the read buffers. However, what this does is ensure that the most frequently used information across all backends gets cached and not flushed to disk. On Linux this cache is significantly slower than the OS disk cache, but it offers guarantees that the OS disk cache dos not and is transparent to PostgreSQL. This is pretty clearly where your problem is.

So what happens is that when we have a request, we check the shared buffers first since PostgreSQL has deep knowledge of this cache, and look for the pages. If they are not there we ask the OS to open them from the file, and if the OS has cached the result it returns the cached copy (this is faster than the shared buffers, but Pg can't tell whether it is cached or on disk, and disk is much slower so PostgreSQL typically will not take that chance). Keep in mind this affects random vs sequential page access as well. So you may get better performance with lower shared_buffers settings.

My gut sense is that you probably get better, or at least more consistent, performance in high concurrency environments with larger shared_buffer settings. Also keep in mind that PostgreSQL grabs this memory and holds it so if you have other things running on the system, the read buffers will hold files read by other processes. It's a very large and complex topic. Larger shared buffer settings provide better guarantees of performance but may deliver less performance in some cases.

Postgresql – When is Postgres autovacuum executed

First things first — 8.4 is no longer supported, so consider upgrading.

Autovacuum settings are documented.

Let's focus on the settings that affects when autovacuum kicks in. As you might know, this process is responsible for both, vacuuming and analyzing tables.

One of the settings affecting ANALYZE frequency is autovacuum_analyze_threshold. As you can read from the manual, this parameter specifies the minimal amount of rows that should be changed in order for ANALYZE to be triggered. This serves well for the small tables, but on big ones and/or tables with high activity this will lead to a way too frequent analyzing. In order to avoid this, another parameter exists, namely autovacuum_analyze_scale_factor. It specifies the fraction of the table to add to the threshold in order to check whether ANALYZE should kick in.

Say, we have a table with 10,000 rows and 200 of them had changed:

autovacuum_analyze_threshold tells us, that we're over the default 50;
we calculate the fraction based on autovacuum_analyze_scale_factor (which defaults to 0.1) and this gives us 1000 rows;
total calculated threshold therefore is 1050;
as 200 is less then 1050, ANALYZE is not started (we wait for more changes).

For the VACCUM there's another pair of parameters with completely similar behavior: autovacuum_vacuum_threshold and autovacuum_vacuum_scale_factor, except default scale for vacuuming is 0.2 or 20%.

Now, as you can guess, the bigger your table becomes, the more time it takes to trigger the autovacuum on it. Therefore, for bigger tables (typically over 1M rows) it is highly recommended to adjust these settings. You can do this on a per-tables basis using ALTER TABLE ... SET ( storage_parameter = ... ) syntax.

It might be tempting to set *_scale_factor to 0 and increase only the threshold for the bigger tables. Still, it better to keep factor to a small, but non-zero value anyway, as for a table with high activity 100,000 rows change might be occurring too frequently, causing unnecessary autovacuuming. See this thread in the pgsql-performance list.

Best Answer

Related Solutions

Postgresql – Increasing work_mem and shared_buffers on Postgres 9.2 significantly slows down queries

Postgresql – When is Postgres autovacuum executed

Related Question