VACUUM rewrites the entire block, efficiently packing the remaining rows and leaving a single contiguous block of free space (though this space isn't zeroed and the physical disk file might contain the remnants of deleted rows which of course are in no way visible to the database user).
test schema:
--#psql postgres postgres
select oid from pg_database where datname='postgres';
/*
oid
-------
12035
*/
create schema stack;
set search_path=stack;
create table foo(bar text);
insert into stack.foo(bar) values('row 1');
insert into stack.foo(bar) values('row 2');
checkpoint;
select * from foo;
/*
bar
-------
row 1
row 2
*/
select relfilenode
from pg_class c join pg_namespace n on n.oid=c.relnamespace
where nspname='stack' and relname='foo';
/*
relfilenode
-------------
446488
*/
physical contents of table backing file:
xxd -a /var/lib/postgresql/9.3/main/base/12035/446488
0000000: 3800 0000 f093 9d83 0000 0000 2000 c01f 8........... ...
0000010: 0020 0420 0000 0000 e09f 3c00 c09f 3c00 . . ......<...<.
0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
*
0001fc0: 2176 4c14 0000 0000 0000 0000 0000 0000 !vL.............
0001fd0: 0200 0100 0208 1800 0d72 6f77 2032 0000 .........row 2.. <---- this is row 2
0001fe0: 2176 4c14 0000 0000 0000 0000 0000 0000 !vL.............
0001ff0: 0100 0100 0208 1800 0d72 6f77 2031 0000 .........row 1..
delete and vacuum:
delete from foo where bar='row 1';
vacuum stack.foo;
checkpoint;
physical contents of table backing file:
xxd -a /var/lib/postgresql/9.3/main/base/12035/446488
0000000: 3800 0000 e8b5 9e83 0000 0500 2000 e01f 8........... ...
0000010: 0020 0420 0000 0000 0000 0000 e09f 3c00 . . ..........<.
0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
*
0001fc0: 2176 4c14 0000 0000 0000 0000 0000 0000 !vL.............
0001fd0: 0200 0100 0209 1800 0d72 6f77 2032 0000 .........row 2.. <---- this is free space
0001fe0: 2176 4c14 0000 0000 0000 0000 0000 0000 !vL.............
0001ff0: 0200 0100 0209 1800 0d72 6f77 2032 0000 .........row 2..
insert new row:
insert into stack.foo(bar) values('row 3');
checkpoint;
final physical contents of table backing file:
xxd -a /var/lib/postgresql/9.3/main/base/12035/446488
0000000: 3800 0000 c8ec 9e83 0000 0100 2000 c01f 8........... ...
0000010: 0020 0420 0000 0000 c09f 3c00 e09f 3c00 . . ......<...<.
0000020: 0000 0000 0000 0000 0000 0000 0000 0000 ................
*
0001fc0: 2476 4c14 0000 0000 0000 0000 0000 0000 $vL.............
0001fd0: 0100 0100 0208 1800 0d72 6f77 2033 0000 .........row 3.. <---- this is row 3
0001fe0: 2176 4c14 0000 0000 0000 0000 0000 0000 !vL.............
0001ff0: 0200 0100 0209 1800 0d72 6f77 2032 0000 .........row 2..
clean up:
drop schema stack cascade;
Perhaps you could start with:
SELECT *
FROM pg_stat_all_indexes
WHERE schemaname NOT IN ( 'pg_catalog', 'pg_toast' )
AND idx_scan = 0
AND idx_tup_read = 0
AND idx_tup_fetch = 0
AND indexrelid NOT IN (
SELECT indexrelid
FROM pg_index
WHERE indisunique = true
OR indisprimary = true ) ;
No guarantee that the indexes that pop up aren't ever used, but it should provide a list to start looking at.
Best Answer
PostgreSQL leaves dead data in the table; the space can be reused, but the files won't shrink (significantly).
The official method to reclaim space is
VACUUM (FULL)
, but that rewrites the whole table, which will be unavailable for any access during that time. There are extensions calledpg_squeeze
andpg_repack
which do the same thing with less disruption "behind the scenes".All these methods have in common that they will require enough free space to create a copy of the table, so you probably won't get around increasing the storage space anyway.
Now for the good news:
If you run a plain
VACUUM
on the table, which is not disruptive, the wasted space can be reused. So your next attempt to load the data won't increase the size of the table.