PostgreSQL – Alternative to CLUSTER Without Table Lock

clusteringlockingpostgresql

I'm facing performance degradation and increasing storage usage as a result of frequent new and updated records causing index and storage fragmentation.

VACUUM doesn't help much.

Unfortunetely CLUSTER isn't an option as it causes downtime and pg_repack isn't available for AWS RDS.

I'm looking for hacky alternatives to CLUSTER. One that seems to work fine in my local tests is:

begin;

create temp table tmp_target as select * from target;

delete from target;
insert into target select * from tmp_target order by field1 asc, field2 desc;

drop table tmp_target;

commit;

The ordering of ctid looks correct with:

select ctid, field1, field2 from target order by ctid;

The question is: does this look fine? Is it going to lock the target table for SELECT queries causing downtime in apps?
Is there a way to list the locks involved in the transaction?

Best Answer

If it's just about table bloat, VACUUM FULL is the tool to do it. (VACUUM only shrinks the physical file if it can opportunistically truncate it at the end.) However, VACUUM FULL also takes an exclusive lock on the table, just like CLUSTER.

To do the same without exclusive lock pg_repack is the tool you are looking for. It's unfortunate that Amazon does not seem to allow it.

VACUUM returning disk space to operating system

Manual solutions introduce all sorts of corner cases and race conditions. You would need to specify your table and possible access patterns exactly before we can discuss the best solution.

Your current solution is no good either way. You do the sort operation after deleting all rows (locking them), which will not shorten the downtime (the duration of the lock) for SELECT queries.

If you have no concurrent write access and no depending objects, this might be better:

BEGIN;

CREATE TEMP TABLE tmp_target AS TABLE target ORDER BY field1, field2 DESC;

TRUNCATE target;
INSERT INTO target TABLE tmp_target;

-- DROP TABLE tmp_target;  -- optional; dropped at end of session automatically

COMMIT;

Sort in the background before taking the lock.
Strictly speaking, INSERT without ORDER BY is free to write rows in any physical order. Practically, though, it will copy the current physical order of the temp table with a plain SELECT * FROM (or TABLE for short).
Use TRUNCATE instead of DELETE, which is faster for big tables. Be aware of some implications like TRUNCATE does not work with FK constraints. Read the manual for full coverage.

You may want to drop existing indexes right before TRUNCATE and recreate after INSERT to make this faster.

SQL Server Locking – How to Lock CREATE TABLE

You best bet is to use an explicit containing transaction and acquire a custom exclusive lock to protect the whole operation (SELECT and CREATE TABLE) using sp_getapplock. System objects do not honor isolation level requests and use locks in the same way as user tables, by design.

The race condition in the original code is that multiple threads can conclude the table does not exist before any thread gets as far as the CREATE TABLE statement.

PostgreSQL – Clustering Rows Without Exclusive Lock and Logging Overhead

You can do this without using the cluster command and having the table locked or generating WAL for the whole table. The cost is that you need to full-scan the table regularly.

The basic idea is:

turn off autovacuum for the table
check each block to determine the degree of clustering
delete and re-insert all the rows from blocks below a clustering threshold
manually vacuum to free those (complete) blocks
repeat steps 2-4 as regularly as necessary

test schema sample data initially 'part-clustered':

create schema stack;
set search_path=stack;
create type t_tid as (blkno bigint, rowno integer);
create table foo(host_id integer, bar text default repeat('a',400)) with (autovacuum_enabled=false);
insert into foo(host_id) select mod(g,10) from generate_series(1,500000) g order by mod(g,10);
insert into foo(host_id) select mod(g,10) from generate_series(1,500000) g;
create index nu_foo on foo(host_id);

initial clustering statistics:

select cn, count(*)
from ( select count(*) cn
       from (select distinct (ctid::text::t_tid).blkno, host_id from foo) z
       group by blkno ) z
group by cn
order by cn;
/*
 cn | count
----+-------
  1 | 27769  <---- half clustered
  2 |     8
  5 |     1
 10 | 27778  <---- half un-clustered
*/
select count(distinct (ctid::text::t_tid).blkno) from foo where host_id=1;
/*
 count
-------
 30558  <--------- lots of blocks to read for `host_id=1`
*/

initial analyze (2146.503 ms):

explain analyze select count(bar) from foo where host_id=1;
/*
                                                           QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=15097.30..15097.31 rows=1 width=32) (actual time=2146.157..2146.158 rows=1 loops=1)
   ->  Bitmap Heap Scan on foo  (cost=95.17..15084.80 rows=5000 width=32) (actual time=21.586..2092.379 rows=100000 loops=1)
         Recheck Cond: (host_id = 1)
         Rows Removed by Index Recheck: 286610
         ->  Bitmap Index Scan on nu_foo  (cost=0.00..93.92 rows=5000 width=0) (actual time=19.232..19.232 rows=100000 loops=1)
               Index Cond: (host_id = 1)
 Total runtime: 2146.503 ms
*/

delete and re-insert the un-clustered rows:

with w as ( select blkno
            from (select distinct (ctid::text::t_tid).blkno, host_id from foo) z
            group by blkno
            having count(*)>2 )
   , d as ( delete from foo
            where (ctid::text::t_tid).blkno in (select blkno from w)
            returning * )
insert into foo(host_id,bar) select host_id,bar from d order by host_id;
--
vacuum foo;

new clustering statistics:

select cn, count(*)
from ( select count(*) cn
       from (select distinct (ctid::text::t_tid).blkno, host_id from foo) z
       group by blkno ) z
group by cn
order by cn;
/*
 cn | count
----+-------
  1 | 55541  <---- fully clustered
  2 |    16
*/
select count(distinct (ctid::text::t_tid).blkno) from foo where host_id=1;
/*
 count
-------
  5558  <--------- far fewer blocks to read for `host_id=1`
*/

new analyze (48.804 ms):

explain analyze select count(bar) from foo where host_id=1;
/*
                                                          QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=16110.64..16110.65 rows=1 width=32) (actual time=48.760..48.761 rows=1 loops=1)
   ->  Bitmap Heap Scan on foo  (cost=131.18..16098.14 rows=5000 width=32) (actual time=8.402..32.439 rows=100000 loops=1)
         Recheck Cond: (host_id = 1)
         ->  Bitmap Index Scan on nu_foo  (cost=0.00..129.93 rows=5000 width=0) (actual time=7.636..7.636 rows=100000 loops=1)
               Index Cond: (host_id = 1)
 Total runtime: 48.804 ms
*/

clean up:

drop schema stack cascade;

The above is workable now, but is a bit quirky (needing to turn off auto-vacuum for the table) and requires regular full-scanning the table. I think something similar without the disadvantages could built into postgres. You'd need:

A space efficient index to cluster on (this is coming in 9.4 with GIN compression, or better still in 9.5 with the new BRIN index type)
A 'vacuum-like' process that would scan that index to detect which blocks need to be deleted/reinserted (this would ideally be able to reinsert the rows into fresh blocks so auto-vacuum can be left at default)

Best Answer

Related Solutions

SQL Server Locking – How to Lock CREATE TABLE

PostgreSQL – Clustering Rows Without Exclusive Lock and Logging Overhead

Related Question