PostgreSQL difference between VACUUM FULL and CLUSTER

clustered-indexpostgresqlpostgresql-9.1vacuum

I have a table with 200 GB of size occupied by data and 180 GB of size by the 6 indexes on it. It is 30% bloated, so I want to reclaim unwanted space occupied by it. It is clustered on job_id_idx index.

So to reclaim the space do I need to use cluster command or vacuum full command?

  1. What is the difference between this two command ?

  2. Is vacuum full order by some column same as cluster command?

  3. Is index recreated in both the commands?

  4. In my case which one will be faster?

The version of PostgreSQL database is 9.1

Best Answer

To check what CLUSTER does, I took a table fo mine from an earlier experiment which basically contained the first 10 million positive integers. I already deleted some rows and there is an other column as well but these only affect the actual table size, so it is not that interesting.

First, having run VACUUM FULL on the table fka, I took its size:

\dt+ fka
                    List of relations
 Schema | Name | Type  |  Owner   |  Size  | Description 
--------+------+-------+----------+--------+-------------
 public | fka  | table | test     | 338 MB | 

Then let's see the physical order of the data from the very beginning of the table:

SELECT *, ctid FROM fka ORDER BY ctid LIMIT 5;

 id  | col1 |  ctid   
-----+------+---------
   2 | 2    | (0,1)
   3 | 3    | (0,2)
   4 | 4    | (0,3)
   5 | 5    | (0,4)
   6 | 6    | (0,5)

Now let's delete some rows:

DELETE FROM fka WHERE id % 10 = 5;
--DELETE 1000000

After this, the reported table size did not change. So let's see now what CLUSTER does:

CLUSTER fka USING fka_pkey;

SELECT *, ctid FROM fka ORDER BY ctid LIMIT 5;

 id  | col1 |  ctid   
-----+------+---------
   2 | 2    | (0,1)
   3 | 3    | (0,2)
   4 | 4    | (0,3)
   6 | 6    | (0,4)
   7 | 7    | (0,5)

After the operation the table size changed from 338 to 296 MB. From the ctid column, which describes the physical place of the tuple in the page, you also see that there is no gap where the row matching id = 5 used to be.

As the tuples were reordered, indexes should have been recreated so that they point to the correct places.

So the difference looks to be that VACUUM FULL does not order the rows. As far as I know, there is some difference in the mechanism the two commands use but from a practical point of view this seems to be the main (only?) difference.