Postgresql – For alignment optimized table is bigger than original table – why

database-designdatatypesdisk-spacepostgresqlvacuum

In another question I learned that I should optimize the layout from one of my tables to save space and have better performance. I did this, but ended up with a larger table than before and performance did not change. Of course I did a VACUUM ANALYZE. How comes?

(I see that the index sizes won't change if I only index single columns.)

This is the table I was coming from (I've added sizes + padding):

                               Table "public.treenode"
    Column     |           Type           | Size |          Modifiers
---------------+--------------------------+------+-------------------------------
 id            | bigint                   | 8    | not null default nextval( ...
 user_id       | integer                  | 4+4  | not null
 creation_time | timestamp with time zone | 8    | not null default now()
 edition_time  | timestamp with time zone | 8    | not null default now()
 project_id    | integer                  | 4    | not null
 location      | real3d                   | 36   | not null
 editor_id     | integer                  | 4+4  |
 parent_id     | bigint                   | 8    |
 radius        | real                     | 4    | not null default 0
 confidence    | smallint                 | 2    | not null default 5
 skeleton_id   | integer                  | 4    | not null

With real3d being defined as

CREATE TYPE real3d AS (
  x real,
  y real,
  z real);

I changed this layout to the following:

                                Table "public.treenode_new"
    Column     |           Type           | Size |            Modifiers
---------------+--------------------------+------+--------------------------------
 id            | bigint                   | 8    | not null default nextval(' ...
 project_id    | integer                  | 4    | not null
 location_x    | real                     | 4    | not null
 location_y    | real                     | 4    | not null
 location_z    | real                     | 4    | not null
 editor_id     | integer                  | 4    | not null
 user_id       | integer                  | 4    | not null
 creation_time | timestamp with time zone | 8    | not null default now()
 edition_time  | timestamp with time zone | 8    | not null default now()
 skeleton_id   | integer                  | 4    | not null
 radius        | real                     | 4    | not null default 0
 confidence    | real                     | 4+4  | not null default 5
 parent_id     | bigint                   | 8    |

If I am not mistaken, I should save 66 bytes per row (138 is one original row, 72 is a new row). This however is not happening: With 7604913 in these tables, the original table had a size of 1020 MB. The new table has a size of 1159 MB. I used pg_size_pretty(pg_relation_size('<tablename>')) to measure the sizes. So what am I missing?

One note: all but the last four columns are inherited from another table (of which I of course had to change the layout, too).

Update: After running VACUUM FULL as suggested by Erwin Brandstetter, the new table needs only 734 MB.

Best Answer

The size of the physical table is typically (except for opportunistic pruning of removable pages from the end of the table) not reduced by running VACUUM (or VACUUM ANALYZE). You need to run VACUUM FULL to actually shrink the table.

That's not necessarily what you want to do on a regular basis if you have write load on your table. Dead rows provide wiggle room for UPDATEs to place updated row versions on the same data page, which allows for better performance. There is also a cost to shrinking and growing the physical table of a relation. Plus, VACUUM FULL takes out an exclusive lock on the table.
That's why autovacuum only runs VACUUM (and ANALYZE) not VACUUM FULL.

Read-only (or mostly-read) tables are best kept to their minimum size, though. And excessive bloat after changes to the table definition (or for some other reason) is best removed immediately, too.

Try VACUUM FULL on both versions of the table and measure again. The difference should show.

You can also try my query with multiple tests for more information on row / table sizes.