Postgresql – Reindexing a large database

data-warehouseindexpostgresql

I have a very large database (millions of rows) which has been indexed.
I receive a diff of rows from another source that I need to update in my database.

The problem I am facing is that is it necessary to re-index the entire database/table again since it is a very expensive process and I don't want our site to be down for the period it is being indexed.
Is there a better way or a way to add the partial index to the original one so that I do not need to reindex everything ?

I am using postgresql 9.1 as a database.

Best Answer

After understanding your use better, I think the real answer is: you have no need to REINDEX.

REINDEX recreates the entire index. But indexes are maintained incrementally; as rows are inserted and deleted, or updated with new key values, the index is updated to reflect the change, as well as prior versions.

Because of the way MVCC works, when you delete a row, postgres can't actually delete the row or the index entry immediately, as there may be transactions that still have visibility to the row, so it just marks it deleted. VACUUM is run periodically to garbage collect dead tuples, But indexes can still retain bloat where there are dead entries to old row versions. So, frequent updates to indexed columns or massive deletes can produce many dead index entries which make the key density of index blocks lower and produce inefficiencies in index operations. This is called "index bloat".

You can detect whether you're suffering it using some scripts which inspect catalog views and calculate bloat (available through the prior link). If you do incur index bloat, periodic REINDEX operations may be called for. In your use case, you have several million rows, and on a monthly basis delete or modify patches on the order of a few hundred rows. Over the course of a year, this access pattern is likely to touch only a small fraction of a percent of the total rows, so any index bloat resulting will be minuscule and not justify the expense of a REINDEX operation.