Postgresql – how do you prevent dead rows from hanging around in postgresql

postgresql

I have production and staging RDS instances on amazon, and staging's data is a direct copy of production so both instances have duplicate data.

Doing a EXPLAIN ANALYZE SELECT * from my_table WHERE my_col=true;resulted in this:

Seq Scan on my_table (cost=0.00..142,775.73 rows=1 width=1,436) (actual time=18,170.294..18,170.294 rows=0 loops=1) Filter: my_col Rows Removed by Filter: 360275

Where as in production, it was:

Seq Scan on my_table (cost=0.00..62,145.88 rows=1 width=1,450) (actual time=282.487..282.487 rows=0 loops=1) Filter: my_col Rows Removed by Filter: 366442

When running select pg_total_relation_size('my_table'::regclass);

I found that staging's size was almost double of production. From what I've read, I see that postgresql's MVCC is responsible for this as it keeps multiple versions of rows around. I manually ran VACUUM FULL and afterwards saw that staging's size had been cut down by 2/3. Running that same explain analyze now shows:

Seq Scan on my_table  (cost=0.00..56094.75 rows=1 width=1436) (actual time=1987.340..1987.340 rows=0 loops=1) Filter: my_col Rows Removed by Filter: 360287 Total runtime: 1987.547 ms

Which is great– but what I don't understand is, the documentation suggests that auto vacuum should kick in and be cleaning up these dead rows, yet clearly that was not happening.

I've read several places talk about "don't let your indexes get bloat", and I don't quite understand 1) how an index gets bloat, and 2) how to prevent an index from getting bloat.

How can I prevent this from happening again in the future?

UPDATE

Here are my autovacuum settings:

                name                 |  setting  | unit |  category  |                                        short_desc                                         | extra_desc |  context   | vartype | source  |  min_val  |  max_val   | enumvals | boot_val  | reset_val | sourcefile | sourceline
-------------------------------------+-----------+------+------------+-------------------------------------------------------------------------------------------+------------+------------+---------+---------+-----------+------------+----------+-----------+-----------+------------+------------
 autovacuum                          | on        |      | Autovacuum | Starts the autovacuum subprocess.                                                         |            | sighup     | bool    | default |           |            |          | on        | on        |            |
 autovacuum_analyze_scale_factor     | 0.1       |      | Autovacuum | Number of tuple inserts, updates, or deletes prior to analyze as a fraction of reltuples. |            | sighup     | real    | default | 0         | 100        |          | 0.1       | 0.1       |            |
 autovacuum_analyze_threshold        | 50        |      | Autovacuum | Minimum number of tuple inserts, updates, or deletes prior to analyze.                    |            | sighup     | integer | default | 0         | 2147483647 |          | 50        | 50        |            |
 autovacuum_freeze_max_age           | 200000000 |      | Autovacuum | Age at which to autovacuum a table to prevent transaction ID wraparound.                  |            | postmaster | integer | default | 100000000 | 2000000000 |          | 200000000 | 200000000 |            |
 autovacuum_max_workers              | 3         |      | Autovacuum | Sets the maximum number of simultaneously running autovacuum worker processes.            |            | postmaster | integer | default | 1         | 8388607    |          | 3         | 3         |            |
 autovacuum_multixact_freeze_max_age | 400000000 |      | Autovacuum | Multixact age at which to autovacuum a table to prevent multixact wraparound.             |            | postmaster | integer | default | 10000000  | 2000000000 |          | 400000000 | 400000000 |            |
 autovacuum_naptime                  | 60        | s    | Autovacuum | Time to sleep between autovacuum runs.                                                    |            | sighup     | integer | default | 1         | 2147483    |          | 60        | 60        |            |
 autovacuum_vacuum_cost_delay        | 20        | ms   | Autovacuum | Vacuum cost delay in milliseconds, for autovacuum.                                        |            | sighup     | integer | default | -1        | 100        |          | 20        | 20        |            |
 autovacuum_vacuum_cost_limit        | -1        |      | Autovacuum | Vacuum cost amount available before napping, for autovacuum.                              |            | sighup     | integer | default | -1        | 10000      |          | -1        | -1        |            |
 autovacuum_vacuum_scale_factor      | 0.2       |      | Autovacuum | Number of tuple updates or deletes prior to vacuum as a fraction of reltuples.            |            | sighup     | real    | default | 0         | 100        |          | 0.2       | 0.2       |            |
 autovacuum_vacuum_threshold         | 50        |      | Autovacuum | Minimum number of tuple updates or deletes prior to vacuum.                               |            | sighup     | integer | default | 0         | 2147483647 |          | 50        | 50        |            |

Best Answer

Auto-vacuuming should eventually get around to cleaning it up (assuming you haven't disabled it), but it may not be getting around to it soon enough for your purposes. There are many settings which can control auto-vacuuming and how/when it's done, which may be of interest: here and here.

This can be especially true of tables with high churn. That is, tables with lots of insertions and deletions. Long-running and idle transactions can also be a factor here, as MVCC will kick in and prevent the dead tuples from being reclaimed. The fact that manually doing a VACUUM frees the dead tuples suggests that this isn't the case for you, though, and it may be the former issue instead.

In general, it's not recommended to do a VACUUM FULL, as that takes out an exclusive table lock, particularly when most rows in a table have been updated/deleted.

From the doc:

The FULL option is not recommended for routine use, but might be useful in special cases. An example is when you have deleted or updated most of the rows in a table and would like the table to physically shrink to occupy less disk space and allow faster table scans. VACUUM FULL will usually shrink the table more than a plain VACUUM would.

Is your usage pattern such that this would be the case? You did mention a "direct copy" was involved, but it's not clear exactly how that's being done.

I have had cases with high-churn tables where the default auto-vacuum rate just wasn't enough, and even relatively small amounts of dead tuples would greatly affect the query speed (this was in a large table which was queried very often and where the query needed to be extremely fast, and as such, was highly affected by dead tuples).

To help with this, I setup a manual VACUUM ANALYZE of the table (so it will both free up the tuples and aid the query planner by updating the stats) in a cron job that was set to run every 5 minutes. Since there weren't that many dead tuples, the VACUUM was pretty fast, and the constant vacuuming keeps the dead tuple count low enough so as to keep queries of that table fast.

Edit in response to comment from OP:

In the VACUUM doc, it says that:

VACUUM reclaims storage occupied by dead tuples

The doc then says that (emphasis mine):

VACUUM ANALYZE performs a VACUUM and then an ANALYZE for each selected table. This is a handy combination form for routine maintenance scripts. See ANALYZE for more details about its processing.

So it definitely reclaims dead tuples.