Postgresql – What are the drawbacks with using UUID or GUID as a primary key

datatypesderbypostgresqlprimary-key

I would like to build a distributed system. I need to store data in databases and it would be helpful to use an UUID or a GUID as a primary key on some tables. I assume it's a drawbacks with this design since the UUID/GUID is quite large and they are almost random. The alternative is to use an auto-incremented INT or LONG.

What are the drawbacks with using UUID or GUID as a primary key for my tables?

I will probably use Derby/JavaDB (on the clients) and PostgreSQL (on the server) as DBMS.

Best Answer

It depends on your generation function and size of the final tables

GUIDs are intended to be globally unique identifiers. As discussed in the Postgres 8.3 documentation there are no methodologies that are universally appropriate to generate these identifiers, but postgreSQL does ship with a few more useful candidates.

From the scope of your problem, and the need for offline writes, you've quite neatly boxed out the use of anything but a GUID, and therefore there are no compensatory advantages of other schemes.

From a functional standpoint, the key length is usually not an issue on any kind of modern system, depending on the number of reads and size of the table. As an alternative methodology, offline clients could batch new records without a primary key and simply insert them when reconnecting. As postgreSQL offers the "Serial" datatype, clients will never need to determine the ID if they can perform a simple write to the database.

Related Solutions

Mysql – Would Mongo’s ObjectID work well in InnoDB’s clustered index

I would believe that it won't work as well and here's why.

it's larger then a int (4 bytes) or bigint (8 bytes). Remember, every other index references the primary key and so it bloats the other indexes. This might not be a concern, but it's something to be aware of.
While the timestamp is sequential, the machine id, process id, and random value are not always going to be, so anything that happens in the same second isn't going to always be in ascending order.

So in conclusion, I don't believe that the Mongo ObjectID is going to perform as well as a smaller, ascending key.

That said, you still can use it as a primary key. It doesn't need to be ascending, it just means that you might have index leaf splits on inserts, which if they're close enough to the end, won't be overly performance impacting. And you can always rebuild the table to compact the indexs.

So, more about the leaf splits. By default, innodb likes a 15/16 fill, leaving 1/16th free for future changes. When the data is inserted in a sequential order, the indexes end up always filled to 15/16.

When the data is inserted randomly/out of order, innodb keeps the pages between 1/2 to 15/16 full, depending on what it believes the end fill rate of the index page will be. This obviously wastes disk space and memory when the index is loaded into memory.

Postgresql – Primary key with randomly varying increments (so it cannot be guessed easily)

I suggest a function taking a regclass parameter that runs ALTER SEQUENCE with a new randomly generated increment before it returns the next value from a given sequence.
Can be used as drop-in replacement for nextval().

Per documentation on ALTER SEQUENCE:

increment

The clause INCREMENT BY increment is optional. A positive value will make an ascending sequence, a negative one a descending sequence. If unspecified, the old increment value will be maintained.

However:

You must own the sequence to use ALTER SEQUENCE.

So we need to take care of privileges. You could make the function SECURITY DEFINER and owned by a superuser. If you don't REVOKE privileges from public it works for anyone on any sequence. There are two basic strategies to restrict usage:

To allow for selected sequences only, change the owner of those sequences to some dedicated role, say randseq and make randseq own the function (still with SECURITY DEFINER).
To allow for selected roles only, REVOKE all privileges on the function from public and GRANT EXECUTE to said roles. You might use a group role to simplify privilege management.

Or combine both:

CREATE OR REPLACE FUNCTION nextval_rand(regclass)
  RETURNS int AS
$func$
BEGIN
EXECUTE format('ALTER SEQUENCE %s INCREMENT %s'
               , $1                          -- regclass automatically sanitized
               , (random() * 100)::int + 1); -- values between 1 and 100
RETURN nextval($1)::int;
END
$func$ LANGUAGE plpgsql SECURITY DEFINER;

-- to restrict usage:
ALTER FUNCTION nextval_rand(regclass) OWNER TO randseq;
REVOKE ALL ON FUNCTION nextval_rand(regclass) FROM public;
GRANT EXECUTE ON FUNCTION nextval_rand(regclass) TO randseq;
GRANT randseq TO ???;

Call:

SELECT nextval_rand('tbl_tbl_id_seq'::regclass);

All you have to do now is replace nextval() with nextval_rand() in the column default of any serial column. And possibly change the owner of the sequence.

ALTER SEQUENCE tbl_tbl_id_seq OWNER TO randseq;
ALTER TABLE tbl ALTER COLUMN tbl_id SET DEFAULT nextval_rand('tbl_tbl_id_seq'::regclass);

Notes

ALTER SEQUENCE is designed not to block concurrent transactions. It takes effect immediately and cannot be rolled back. It should work reliably in a multi-user environment. Read the Notes section of the manual page for the fine print of ALTER SEQUENCE behavior.

There is a very slim chance for a race condition, where two concurrent operations each run ALTER SEQUENCE before calling nextval(). Since we are operating with random numbers anyway, this really doesn't matter.

Since we are running dynamic SQL I would normally SET search_path = public, pg_temp for the function. But since the parameter is regclass, only valid sequence names can be passed and are automatically schema-qualified and escaped unambiguously.

Best Answer

Related Solutions

Mysql – Would Mongo’s ObjectID work well in InnoDB’s clustered index

Postgresql – Primary key with randomly varying increments (so it cannot be guessed easily)

Notes

Related Question