Postgresql – Improving PostgreSQL cluster performance

performancepostgresql-9.4

I am trying to cluster my table faster. The table is 27 GB worth of data, and has one index using a point geometry which is 19 GB in size. I am seeing about 5.5 hours to cluster the table.

I am assuming clustering is a IO heavy process, so anything to improve read/write times (new disk) would be my first thought, but are there any other things that could improve the time it takes to cluster a table?

I ran this test on a machine with an SSD and 12 GB of RAM, using postgresql 9.4

Best Answer

I found an interesting solution which you might be interested in. Basically, it involves creating a temporary clone of the table, where the data is copied into (this does the clustering). This will keep your original table available. After the insert is completed; you use the copy as new source; and discard the original (unclustered) table.

The table is partitioned into three child tables: static1, static2, and latest. The parent table has a rule that redirects all inserts into the "latest" partition. This serves as the data collection point, where new rows sit until they are migrated into one of the other two partitions. The "static1" and "static2" partitions serve as front and back buffers for the cluster operation. Only one partition will actually have any rows at any given time (the front buffer); the other will be empty (the back buffer).

You can also have a look at pg_repack; which claims to provide an alternative to VACUUM FULL without the exclusive lock. This plugin basically does the same as the blog above does manually:

To perform full table repacks, pg_repack creates a work table in the "repack" schema and sorts the rows in this table. Then, it updates the system catalogs directly to swap the work table and the original one.

Alternatively, playing with maintenance_work_mem might be a simple fix; increasing that might be enough.