You can do this without using the cluster
command and having the table locked or generating WAL for the whole table. The cost is that you need to full-scan the table regularly.
The basic idea is:
- turn off autovacuum for the table
- check each block to determine the degree of clustering
- delete and re-insert all the rows from blocks below a clustering threshold
- manually vacuum to free those (complete) blocks
- repeat steps 2-4 as regularly as necessary
test schema sample data initially 'part-clustered':
create schema stack;
set search_path=stack;
create type t_tid as (blkno bigint, rowno integer);
create table foo(host_id integer, bar text default repeat('a',400)) with (autovacuum_enabled=false);
insert into foo(host_id) select mod(g,10) from generate_series(1,500000) g order by mod(g,10);
insert into foo(host_id) select mod(g,10) from generate_series(1,500000) g;
create index nu_foo on foo(host_id);
initial clustering statistics:
select cn, count(*)
from ( select count(*) cn
from (select distinct (ctid::text::t_tid).blkno, host_id from foo) z
group by blkno ) z
group by cn
order by cn;
/*
cn | count
----+-------
1 | 27769 <---- half clustered
2 | 8
5 | 1
10 | 27778 <---- half un-clustered
*/
select count(distinct (ctid::text::t_tid).blkno) from foo where host_id=1;
/*
count
-------
30558 <--------- lots of blocks to read for `host_id=1`
*/
initial analyze (2146.503 ms):
explain analyze select count(bar) from foo where host_id=1;
/*
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=15097.30..15097.31 rows=1 width=32) (actual time=2146.157..2146.158 rows=1 loops=1)
-> Bitmap Heap Scan on foo (cost=95.17..15084.80 rows=5000 width=32) (actual time=21.586..2092.379 rows=100000 loops=1)
Recheck Cond: (host_id = 1)
Rows Removed by Index Recheck: 286610
-> Bitmap Index Scan on nu_foo (cost=0.00..93.92 rows=5000 width=0) (actual time=19.232..19.232 rows=100000 loops=1)
Index Cond: (host_id = 1)
Total runtime: 2146.503 ms
*/
delete and re-insert the un-clustered rows:
with w as ( select blkno
from (select distinct (ctid::text::t_tid).blkno, host_id from foo) z
group by blkno
having count(*)>2 )
, d as ( delete from foo
where (ctid::text::t_tid).blkno in (select blkno from w)
returning * )
insert into foo(host_id,bar) select host_id,bar from d order by host_id;
--
vacuum foo;
new clustering statistics:
select cn, count(*)
from ( select count(*) cn
from (select distinct (ctid::text::t_tid).blkno, host_id from foo) z
group by blkno ) z
group by cn
order by cn;
/*
cn | count
----+-------
1 | 55541 <---- fully clustered
2 | 16
*/
select count(distinct (ctid::text::t_tid).blkno) from foo where host_id=1;
/*
count
-------
5558 <--------- far fewer blocks to read for `host_id=1`
*/
new analyze (48.804 ms):
explain analyze select count(bar) from foo where host_id=1;
/*
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Aggregate (cost=16110.64..16110.65 rows=1 width=32) (actual time=48.760..48.761 rows=1 loops=1)
-> Bitmap Heap Scan on foo (cost=131.18..16098.14 rows=5000 width=32) (actual time=8.402..32.439 rows=100000 loops=1)
Recheck Cond: (host_id = 1)
-> Bitmap Index Scan on nu_foo (cost=0.00..129.93 rows=5000 width=0) (actual time=7.636..7.636 rows=100000 loops=1)
Index Cond: (host_id = 1)
Total runtime: 48.804 ms
*/
clean up:
drop schema stack cascade;
The above is workable now, but is a bit quirky (needing to turn off auto-vacuum for the table) and requires regular full-scanning the table. I think something similar without the disadvantages could built into postgres. You'd need:
- A space efficient index to cluster on (this is coming in 9.4 with GIN compression, or better still in 9.5 with the new BRIN index type)
- A 'vacuum-like' process that would scan that index to detect which blocks need to be deleted/reinserted (this would ideally be able to reinsert the rows into fresh blocks so auto-vacuum can be left at default)
Best Answer
uuid-ossp
modulePostgreSQL uses the standardized UUID generation algorithms provided by ITU-T Rec. X.667, ISO/IEC 9834-8:2005, and RFC 4122. From the docs on
uuid-ossp
,So long as the MAC address does not change, you'll be golden.
That all said, I agree with @a_horse_with_no_name,
In fact, given the chance of fewer collisions and more security, I would take it. And to that I would use
uuid_generate_v4()