I would believe that it won't work as well and here's why.
it's larger then a int (4 bytes) or bigint (8 bytes). Remember, every other index references the primary key and so it bloats the other indexes. This might not be a concern, but it's something to be aware of.
While the timestamp is sequential, the machine id, process id, and random value are not always going to be, so anything that happens in the same second isn't going to always be in ascending order.
So in conclusion, I don't believe that the Mongo ObjectID is going to perform as well as a smaller, ascending key.
That said, you still can use it as a primary key. It doesn't need to be ascending, it just means that you might have index leaf splits on inserts, which if they're close enough to the end, won't be overly performance impacting. And you can always rebuild the table to compact the indexs.
So, more about the leaf splits. By default, innodb likes a 15/16 fill, leaving 1/16th free for future changes. When the data is inserted in a sequential order, the indexes end up always filled to 15/16.
When the data is inserted randomly/out of order, innodb keeps the pages between 1/2 to 15/16 full, depending on what it believes the end fill rate of the index page will be. This obviously wastes disk space and memory when the index is loaded into memory.
I suggest a function taking a regclass
parameter that runs ALTER SEQUENCE
with a new randomly generated increment before it returns the next value from a given sequence.
Can be used as drop-in replacement for nextval()
.
Per documentation on ALTER SEQUENCE
:
increment
The clause INCREMENT BY
increment is optional. A positive value will
make an ascending sequence, a negative one a descending sequence. If
unspecified, the old increment value will be maintained.
However:
You must own the sequence to use ALTER SEQUENCE
.
So we need to take care of privileges. You could make the function SECURITY DEFINER
and owned by a superuser. If you don't REVOKE
privileges from public
it works for anyone on any sequence. There are two basic strategies to restrict usage:
To allow for selected sequences only, change the owner of those sequences to some dedicated role, say randseq
and make randseq
own the function (still with SECURITY DEFINER
).
To allow for selected roles only, REVOKE
all privileges on the function from public
and GRANT EXECUTE
to said roles. You might use a group role to simplify privilege management.
Or combine both:
CREATE OR REPLACE FUNCTION nextval_rand(regclass)
RETURNS int AS
$func$
BEGIN
EXECUTE format('ALTER SEQUENCE %s INCREMENT %s'
, $1 -- regclass automatically sanitized
, (random() * 100)::int + 1); -- values between 1 and 100
RETURN nextval($1)::int;
END
$func$ LANGUAGE plpgsql SECURITY DEFINER;
-- to restrict usage:
ALTER FUNCTION nextval_rand(regclass) OWNER TO randseq;
REVOKE ALL ON FUNCTION nextval_rand(regclass) FROM public;
GRANT EXECUTE ON FUNCTION nextval_rand(regclass) TO randseq;
GRANT randseq TO ???;
Call:
SELECT nextval_rand('tbl_tbl_id_seq'::regclass);
All you have to do now is replace nextval()
with nextval_rand()
in the column default of any serial column. And possibly change the owner of the sequence.
ALTER SEQUENCE tbl_tbl_id_seq OWNER TO randseq;
ALTER TABLE tbl ALTER COLUMN tbl_id SET DEFAULT nextval_rand('tbl_tbl_id_seq'::regclass);
Notes
ALTER SEQUENCE
is designed not to block concurrent transactions. It takes effect immediately and cannot be rolled back. It should work reliably in a multi-user environment. Read the Notes section of the manual page for the fine print of ALTER SEQUENCE
behavior.
There is a very slim chance for a race condition, where two concurrent operations each run ALTER SEQUENCE
before calling nextval()
. Since we are operating with random numbers anyway, this really doesn't matter.
Since we are running dynamic SQL I would normally SET search_path = public, pg_temp
for the function. But since the parameter is regclass
, only valid sequence names can be passed and are automatically schema-qualified and escaped unambiguously.
Best Answer
It depends on your generation function and size of the final tables
GUIDs are intended to be globally unique identifiers. As discussed in the Postgres 8.3 documentation there are no methodologies that are universally appropriate to generate these identifiers, but postgreSQL does ship with a few more useful candidates.
From the scope of your problem, and the need for offline writes, you've quite neatly boxed out the use of anything but a GUID, and therefore there are no compensatory advantages of other schemes.
From a functional standpoint, the key length is usually not an issue on any kind of modern system, depending on the number of reads and size of the table. As an alternative methodology, offline clients could batch new records without a primary key and simply insert them when reconnecting. As postgreSQL offers the "Serial" datatype, clients will never need to determine the ID if they can perform a simple write to the database.