Postgresql – Database size reduced after backup on PostgreSQL 8.3 and restore in PostgreSQL 9.4

disk-spacepostgresqlrestore

I did a pg_dump on a JIRA database that I was hosting in a PostgreSQL 8.3 server. Size of the database after vacuum full was 217132652 (approximately 207 MB).

Then I restored that JIRA database on a PostgreSQL 9.4 server with the following command:

$ psql -X -v ON_ERROR_STOP=1 -d jira2 -U jira -h localhost < jiradb2017_03_12.sql

I assume the restore would exit on any error since I used ON_ERROR_STOP=1, but the SQL script finished correctly (despite some warnings not related to data restore).

I ended up with a database with a size of 158019348 (approximately 151 MB).

So, what's the story here? Can I just assume database was restored successfully and PostgreSQL optimized its storage (somewhere between 8.3 and 9.4 versions) engine and is using space more efficiently?

Best Answer

When you restore a database you have all the information on it packed, with no empty space between rows (or in indices), unless some specific settings are in place (basically: FILLFACTOR for tables and FILLFACTOR for indices).

On the other hand, when your database has been in use for some time, and you've had your share of inserts, updates and deletes, free unused space will appear. This is because of the way PostgreSQL and Multiversion Concurrency Control, a.k.a. MVCC work. MVCC allows for less lockings, which basically means you save time. But you pay a price in terms of space:

  1. Every UPDATE is equivalent to an INSERT together with a DELETE, with the overhead (at least in terms of space used) associated to both.
  2. When you have several transactions running, and every one is INSERTing, UPDATEing or DELETEing, you have simultaneously several copies of every row involved.
  3. The space allocated to these row versions won't be freed immediately after commit, and for a while, will be unused space within the files where your table data (and indexes) is being stored.

Autovacuum takes care of this space being made reusable by default, or you could have some specific procedure for routine vacuuming.

This fact already can explain the size change.

Optimizations between versions probably also took place; and can explain further improvements. Optimizations could also have been made for speed and not for size, and the actual size could actually grow from one version to the next. I really don't know the specifics to be able to tell; although the comment from @Erwin states that both changes making your tables shrink and changes making your tables bloat (grow) have taken place since version 8.3.

To distinguish between the two effects, if you're curious, you could just, as @Jack Douglas suggests, restore your database on 8.3. It will most probably shrink in size. If it shrinks to less than 151 MB (a size smaller that what you get with 9.4 version), then the removal of unused space made your DB shrink, and the version change actually made your DB grow.


For a better understanding of MVCC, look at Bruce Momjian's presentation.