PostgreSQL – Conflict Resolution for Hash-Backed Unique Constraints

hashingindexpostgresqlpostgresql-9.6

So we've just run into

ERROR: index row size 2736 exceeds maximum 2712 for index "foo__bar__un"
  Hint: Values larger than 1/3 of a buffer page cannot be indexed.
Consider a function index of an MD5 hash of the value, or use full text indexing.

(foo is table, bar is column and un for "unique")

Consider a function index of an MD5 hash of the value

Alright, let's.
This is simple enough to do:

ALTER TABLE foo DROP CONSTRAINT foo__bar__un;
CREATE UNIQUE INDEX foo__bar__un ON foo(md5(bar));

So far, so good, this is the solution both proposed in the error hint and found online in numerous places.

What I haven't found yet, though, is how to deal with collisions.

In my case, I have a database with results for biochemical experiments. I can't just go to the scientists and say "can you please change your result a tiny bit so we can actually store it in our database? yes, I understand this may cost a couple of your customers' lives and you personally a potentially prohibitive amount of money, but I can't allow you to store it otherwise".

(Realistically, we'd just drop the constraint way before that, but you get my point.)

I do understand that collisions are almost negligibly rare, but I'd rather know I can deal with them if against all odds one should happen. And given that this isn't a niche issue only we are having, I'm sure I'm not the only one to have ever wondered about this.

So what is the way to deal with collisions, in this case? I'd rather not have to replace the unique index with a DIY trigger-based workaround, if possible.

Can I somehow shoehorn in an additional "equals" check to be executed for entries with the same hash, for example, and only throw an exception if they actually are?

Best Answer

I don't know why the results of a biochemical experiment would be enforced to be unique in the first place, but you can use an exclusion constraint that tests for equality. It will automatically resolve hash collisions by doing the character-by-character comparison.

alter table foo add constraint foo__bar__un exclude using hash (bar with =);

But you should upgrade to something newer than 9.6.