PostgreSQL – Backup and Restore Size Differences

postgresql

I Have a WordPress database in my postgreSQL, in phppgadmin this database is shown with the size of 273 MB.

I've ran sudo -u postgres pg_dump -Fp -f wordpress.sql wordpress to do the database backup, the result is a file with just 18MB.

I've changed the name of the original database to WordPress2 and Then:

I've created a new database called WordPress and did a restore sudo -u postgres psql wordpress < wordpress.sql and the result is a database with size of 19 MB, there is something missing?

Below is a print of phppgadmin, with wordpress as the imported backup and wordpress2 as the original database.
PHPPHADMIN

AutoVacuum

postgres=# select setting from pg_settings where name = 'autovacuum';
 setting
---------
 on
(1 row)

Indexes Space

postgres=# select pg_size_pretty(sum(pg_relation_size(indexrelid))) from pg_index;
     pg_size_pretty
    ----------------
     2600 kB
    (1 row)

Database Sizes
Original

postgres=# select pg_size_pretty(pg_database_size('wordpress2'));
 pg_size_pretty
----------------
 273 MB
(1 row)

Restored

postgres=# select pg_size_pretty(pg_database_size('wordpress'));
 pg_size_pretty
----------------
 20 MB
(1 row)

I Copied the original database

CREATE DATABASE newdb WITH TEMPLATE wordpress2 OWNER postgres;

And performed a vacuum full in this "newdb"

vacuumdb -f -d newdb -z -v

Result is the "newdb" decreased from 273MB to 17MB

postgres=# select pg_size_pretty(pg_database_size('newdb'));
 pg_size_pretty
----------------
 17 MB
(1 row)

Best Answer

I Will answer my own question,

From: https://stackoverflow.com/questions/13505785/postgresql-database-size-is-less-after-backup-load-on-heroku

The reason of this is its MVCC system. Every time you UPDATE any record in a database it creates another "version" of this record instead of rewriting the previous. This "outdated" records will be deleted by VACUUM process, when there will be no need in them.

I've search in PostgreSQL docs and I found: Unlike traditional database systems which use locks for concurrency control, PostgreSQL maintains data consistency by using a multiversion model (Multiversion Concurrency Control, MVCC). This means that while querying a database each transaction sees a snapshot of data (a database version) as it was some time ago, regardless of the current state of the underlying data. This protects the transaction from viewing inconsistent data that could be caused by (other) concurrent transaction updates on the same data rows, providing transaction isolation for each database session.

The main advantage to using the MVCC model of concurrency control rather than locking is that in MVCC locks acquired for querying (reading) data do not conflict with locks acquired for writing data, and so reading never blocks writing and writing never blocks reading.

From: https://stackoverflow.com/questions/924993/postgresql-database-size-increasing If the dead tuples have stacked up beyond what can be accounted for in max_fsm_pages, a regular VACUUM will not be able to free everything. The end result is that the database will grow larger and larger over time as dead space continues to accumulate. Running a VACUUM FULL should fix this problem. Unfortunately it can take a very long time on a large database.

If you're running into this problem frequently, you either need to vacuum more often (autovacuum can help here) or increase the max_fsm_pages setting. When running VACUUM VERBOSE it will tell you how many pages were freed and give you a warning if max_fsm_pages was exceeded, this can help you determine what this value should be. See the manual for more information.

From: postgres backup / restore: restored database much smaller?

Maybe it's just index bloat. VACUUM FULL does not help with index bloat, on the contrary, as said in the doc for 8.4:

The FULL option is not recommended for routine use, but might be useful in special cases. An example is when you have deleted or updated most of the rows in a table and would like the table to physically shrink to occupy less disk space and allow faster table scans. VACUUM FULL will usually shrink the table more than a plain VACUUM would. The FULL option does not shrink indexes; a periodic REINDEX is still recommended. In fact, it is often faster to drop all indexes, VACUUM FULL, and recreate the indexes.

(in more recent versions, this advice has disappeared because VACUUM FULL has been reimplemented differently).

See Routine reindeing and the REINDEX command.

The easiest way to reindex is to connect to the database with the db user that owns it and issue:

REINDEX database test_db;

Ideally, it should be done immediately after VACUUM FULL and the database should shrink to its lowest possible size at this point in time.