Postgresql – Persistence of a PostgreSQL visibility map through RDS snapshot and restore

amazon-rdspostgresqlsnapshot

I have a pretty huge (1,2 TB) DB on a big RDS PostgreSQL 9.6.1 instance (db.m4.4xlarge). The DB is static (no on-line transactions) and we are testing performance of scripts (mainly analytical queries).

The DB was built from a plaintext SQL scripts backup (CREATE TABLE + COPY), so it has no dead tuples. After the restore, I ran just a complete ANALYZE (not a VACUUM).

After the ANALYZE, I saved the DB in a snapshot and deleted the instance.
I would like to start often from that snapshot as a known and stable test bed.

I noticed that the first execution of scripts (e.g. select count(*) from table) after the restore from that snapshot is way too long (i.e. 50min for count(*) of a 116 mln rows table). The subsequent execution of the same query lasts around 15 sec.

It seems to me that 50 minutes can't be explained by the reading of table from SSD instead of RAM cache.

I discovered that visibility map of all tables after the restore+analyze+snapshot wasn't compiled (SELECT ... relpages, relallvisible FROM pg_class...), and the first execution of the query had to do the job of compiling it.

So I ran a complete VACUUM ANALYZE. I saw the performance improvement in the query and took a new snapshot, aiming to save a better test bed (I haven't found documentation about persistence of visibility map in PostgreSQL backups or in RDS snapshots).

Unfortunately I discovered that, after taking snapshot and restoring from it, first execution is still long (no improvement). Visibility map seems to be lost, despite the (SELECT ... relpages, relallvisible FROM pg_class...) showing good values for relallvisible.

Are my findings wrong?

There is a way to have persistence of the visibility map through a RDS snapshot and restore?

There is a documentation to study useful for this aim?

Best Answer