Postgresql – When to ANALYZE/VACUUM/REINDEX when adding primary keys and indexes

postgresqlpostgresql-11

I have a recurring job that creates a Postgres (v11) database from a template and loads it with data. The scale of the data is on the order of 10s or 100s of millions of rows, so for performance reasons I bulk insert the data before creating any primary keys or indexes. This is my current order of operations:

  1. create the tables
  2. bulk COPY all of the data
  3. create primary keys
  4. create additional indexes

Creating the primary keys and the indexes are the slowest parts of this workflow. Is there a point in this workflow when recalculating statistics via ANALYZE/VACUUM/REINDEX would be helpful or even necessary?

Best Answer

You should probably do a VACUUM as step 2.5. If you do it as step 5, or allow the autovac to do it implicitly (as step 5+something), then you will be uselessly vacuuming indexes which don't really need it.

If any of the indexes are expressional indexes, you should do ANALYZE as step 5. Otherwise, make step 2.5 be VACUUM ANALYZE. Expressional indexes have their own statistics gathered on them, but of course if they don't exist at the time of the ANALYZE, that doesn't work.