Postgresql – postgres backup / restore: restored database much smaller

postgresql

I am worried that my restored database is very different from the original:

#check size of postgres database
postgres@db1:/tmp$ psql -c "select pg_size_pretty(pg_database_size('test_db'));"
 pg_size_pretty
----------------
 2105 MB
(1 row)

#backup database
postgres@db1:/tmp$ pg_dump -Fc test_db > test_db_Fc.dump

#rename postgres database (i.e. park it nearby)
postgres@db1:/tmp$ psql -c "alter database test_db rename to test_db_20130322;"
ALTER DATABASE
-------
(1 row)

#restore test_db
postgres@db1:/tmp$ pg_restore -Fc -C -d postgres test_db_Fc.dump

#check size of restored postgres database
postgres@db1:/tmp$ psql -c "select pg_size_pretty(pg_database_size('test_db'));"
 pg_size_pretty
----------------
 257 MB
(1 row)

The original db is many times larger than the restored one. What is going on here?
As far as I can tell, the website that test_db serves is still working perfectly well, after a restore, but I need to know what's up before i use a backup in live context.

FYI if I run vacuumdb on each database there seems to be no change in database size.

[Addendum, added later]
In the tradition of RTFM I have gone hunting in the manual for PostgreSQL.
Here are some more pieces to the puzzle

#is the autovacuum switched on?
postgres@db1:/tmp$ psql -c "SHOW autovacuum;"
 autovacuum
------------
 on
(1 row)

#The "track counts" parameter is needed by autovacuum which uses statistics from database activity to know where to do its job
postgres@db1:/tmp$ psql -c "SHOW track_counts;"
 track_counts
--------------
 on
(1 row)

#is there an autovacuum daemon resident in memory?
postgres@db1:/tmp$ ps  -ef | grep 'autovacuum'
postgres  1261  1021  0 Jan23 ?        00:08:27 postgres: autovacuum launcher process                             
postgres 18347 18149  0 00:33 pts/0    00:00:00 grep autovacuum


#what's been happening on the live server?
postgres@LIVEdb1:/tmp$ psql -c "SELECT relname, last_vacuum, last_autovacuum, last_analyze, last_autoanalyze from pg_stat_all_tables;"
#result is list of 65 tables (out about 300), all empty (no dates at all)

Gratefully following up on @craig-ringer's advice to look into VACUUM FULL I turned to PostgreSQL documentation, (paraphrased) "… In the worst case where your disk is nearly full, VACUUM FULL may be the only workable alternative. … (but) the usual goal of routine vacuuming is to avoid needing VACUUM FULL. The autovacuum daemon attempts to do standard VACUUMs often enough to maintain steady-state usage of disk space …"

I will follow @chris-travers advice and map out the number of rows in tables from each version of the database. I think in my case it'd be fair to say that VACUUM FULL will relieve the pressure on disk space, and it'll make original_vs_restored look better but there's still this red flag of excessive bloat. I think autovacuum isn't doing anything and that's a worry! Thanks for the guidance so far, it's fascinating.

Best Answer

Maybe it's just index bloat. VACUUM FULL does not help with index bloat, on the contrary, as said in the doc for 8.4:

The FULL option is not recommended for routine use, but might be useful in special cases. An example is when you have deleted or updated most of the rows in a table and would like the table to physically shrink to occupy less disk space and allow faster table scans. VACUUM FULL will usually shrink the table more than a plain VACUUM would. The FULL option does not shrink indexes; a periodic REINDEX is still recommended. In fact, it is often faster to drop all indexes, VACUUM FULL, and recreate the indexes.

(in more recent versions, this advice has disappeared because VACUUM FULL has been reimplemented differently).

See Routine reindeing and the REINDEX command.

The easiest way to reindex is to connect to the database with the db user that owns it and issue:

REINDEX database test_db;

Ideally, it should be done immediately after VACUUM FULL and the database should shrink to its lowest possible size at this point in time.