Postgresql updating integer column taking more space than expected

postgresqlpostgresql-9.1update

I have two tables, A(a1 serial, a2 varchar(8)) and B(b2 varchar(8));

My query looks like,

alter table B add column b1 integer;
update B t1 set b1 = t2.a1 from (select b1, b2 from B) t2 where t1.a2 = t2.b2;

Here's my problem:

B was originally around 46 GB and had 6*10^7 entries. According to the documentation integers take up 8 bytes. Thus, 4 * 6 * 10^7 = 24 * 10^7 < 1 GB. The query is still executing. When I log in from another console and do

select pg_size_pretty(pg_relation_size('B'));

I see 60 GB. I'm expecting the size to be more than this when the query ends.

I'm confused by this hike in size. Is it because of the temporary data structures that it creates? If so could I expect the size to go down if i use VACUUM?

I'm using postgres 9.1.

Best Answer

Well, you are running this giant UPDATE statement of all rows in table B:

update B t1 set b1 = t2.a1 from (select b1, b2 from B) t2 where t1.a2 = t2.b2;

which will more-or-less double the size of the table, since the UPDATE has to keep old row versions around. You may want to read up a bit on how PostgreSQL implements MVCC and how vacuuming works, but to answer this question:

If so could I expect the size to go down if i use VACUUM?

The space utilized by the table should go down if you run a VACUUM FULL or CLUSTER; probably a VACUUM alone will not be sufficient to immediately reclaim space.