For testing purposes I periodically restore production backup into a new database (AWS RDS PostgreSQL instance) with pg_restore
. In the new database I run SQL queries that anonymize sensitive data. However, a query like below updates over 20M records in a table that has ~10 indexes (created during pg_restore
).
UPDATE users SET email = users.id || '@example.com',
phone_number = NULL;
This takes a few hours on current setup.
To speed things up, I wanted to disable the table's indexes just for running the update
and then enable them again and reindex the table. How can I do this? Any other ways to speed up this UPDATE
?
Notes:
-
I've tried
UPDATE pg_index SET indisready=false WHERE indrelid = ( SELECT oid FROM pg_class WHERE relname='users' );
but got the error below.
ERROR: permission denied for relation pg_index
Apparently you can't do this as superuser in AWS RDS PostgreSQL instance.
- Dropping the indexes and then creating them again wouldn't be convenient, because I would have to know what columns they cover in advance. Ideally I would have a command that store existing index configuration in some temporary variable and then recreate them again using this info.
Best Answer
When you restore the data, use
pg_restore --section=pre-data --section=data
. This will leave off the indexes.Then run the UPDATEs you want to, then run
pg_restore --section=post-data