PostgreSQL 8.4 – Should VACUUM ANALYZE Be Done Before REINDEXing?

indexpostgresqlpostgresql-8.4

Early in the morning every day a pgAgent job refreshes the contents of table A from table B on my PostgreSQL 8.4 database. Table A contains around 140k records across 91 columns and has two indexes – one as part of the PRIMARY KEY and the other a GIST index on a POINT PostGIS geometry column.

To make the process go a little faster the job drops the index on the geometry column, before deleting the records in the table A and inserting the records from table B, then the index is recreated. This all being done the autovacuum daemon gets to work when it feels like it (after ten minutes or so from comparing the job stats and table stats for the job completion time and autovacuum run time).

Upon checking on the table this morning after all this had happened the table stats told me the table size was 272MB, the TOAST table size was 8192bytes, and the index size was 23MB. This seemed quite large so I issued a REINDEX command on the table and the index size came down to 9832kB.

My question(s) is this:

Why does the REINDEX apparently reduce the size of the indexes so much when the indexes (or at least the geometry column index) have been built anew from scratch? Should I make sure that the table has been vacuumed/analyzed before the indexes are built? Is not dropping the index on the primary key a factor in this? What am I missing?

Best Answer

If the CREATE INDEX statement sees that another session holds an active snapshot that might still be interested in the deleted records, then it includes those deleted records into the new index.

Similarly, if a REINDEX sees that another session holds an active snapshot that might still be interested in the deleted records, then it includes those deleted records into the new index.

If a VACUUM sees that another session holds an active snapshot that might still be interested in the deleted records, then it keeps those records in the table. And then the REINDEX or CREATE INDEX also need to carry them into the new index, as long a the snapshot still exists.

Once there or no longer any snapshots that could possible see the deleted rows, then the VACUUM may remove them from the table. But a CREATE INDEX or REINDEX could also just not carry them over into the new index, whether VACUUM had gotten around to removing them from the able or not.

So in your scenario, the role of the VACUUM between the initial CREATE INDEX and REINDEX is probably just to take up time, during which time your long-running transaction hopefully goes away on its own and drops the interfering snapshot.