Postgresql – What are the performance implications of using uuid as primary key in Postgres 10.12? (need canonical answer)

indexpostgresqlpostgresql-10primary-key

I'm at a crossroads where I need to decide if I'm going to stick with bigserial as my primary key, or change to uuid (non-auto-generating—my API server will generate the ID using uuid v4 and insert it).

I've spent hours researching bigserial vs uuid primary keys, and it seems no one can agree on what the disadvantages of uuid would be (if any). My database isn't that complicated: it's a series of tables with pretty basic relationships, I'm typically only inserting one row at a time, I have a handful of jsonb fields I'm using here and there. Write speed/frequency will only be significantly high on one table in particular.

The reason I started looking into UUIDs wasn't because I think I'll ever run out of bigint keys (9 quintillion, if I recall), but more from a standpoint of obfuscation. Right now I'm having to hash the IDs on the front-end to avoid showing the user DB IDs in the URL (e.g. /things/2732). Using Hashids, I can instead have URLs like /things/To2jZP13dG. But I thought I could go a step further and just use UUIDs which don't give any clues as to # of records. What I don't like is the added effort of having to encode IDs before passing them to the back end and decoding them there, and then when querying batches of 50-100 items to return to the client, having to mass encode all those IDs before returning them to the client.

One argument for random UUIDs (uuid v4) was:

If your primary key is an incrementing ID, those are stored physically next to each other. That database page can have contention as many people are writing to it. Random IDs prevent contention by spreading the writes all over the DB

But then I found a contradictory statement here:

Regular random UUIDs are distributed uniformly over the whole range of possible values. This results in poor locality when inserting data into indexes – all index leaf pages are equally likely to be hit, forcing the whole index into memory. With small indexes that's fine, but once the index size exceeds shared buffers (or RAM), the cache hit ratio quickly deteriorates.

I know the folks at Heroku are fans of using UUIDs as primary keys. For whatever it's worth, I wasn't planning on having Postgres auto-generate the IDs at all. Instead, my API server would generate a v4 UUID and pass that to the database (this would allow my API server & front end client to be more efficient and not have to always use RETURNING id statements in my queries).

Does anyone have a canonical answer on the true cost of INSERT statements when using uuid as the primary key?

Best Answer

It is easy enough to benchmark this, but the INSERT performance of UUIDs will be worse, because they are bigger and slower to generate.

But it doesn't sound like you are building a high performance application anyway (then you probably wouldn't be using JSON), so it probably won't make much difference.

Finally, you want to use UUIDs for security reasons (which I won't discuss here). Security always puts a penalty on performance and usability, so consider it as the price you are paying for security.

Best Answer

Related Solutions

Mysql – Character vs Integer primary keys

Postgresql – What are the drawbacks with using UUID or GUID as a primary key

Related Question