Postgresql – Character varying index overhead & length limit

btreedisk-spaceindexperformancepostgresql

I have a unique constraint on a character varying column that will mostly have lengths of approximately 600 but could get to 10 million.

I have read about B-Trees, but I can't determine how much disk space the index will consume or if such large-sized values will prevent the index from functioning performantly.

What are the impacts upon disk space and time of a B-Tree index on character varying columns with such potentially large lengths?

Best Answer

In Postgres, a unique constraint is implemented with a unique B-tree index. Per documentation:

Adding a unique constraint will automatically create a unique btree index on the column or group of columns used in the constraint.

Indexes use the same basic storing mechanisms as tables: arrays of data pages. Indexes have some minor additional features. Per documentation:

In a table, an item is a row; in an index, an item is an index entry.

Every table and index is stored as an array of pages of a fixed size (usually 8 kB, although a different page size can be selected when compiling the server).

The maximum size of an index entry is a third of a data page, I quote Peter Eisentraut on this:
PostgreSQL primary key length limit

That's 2730 bytes (or a bit less). Meaning your UNIQUE constraint is not possible.

I would consider adding a redundant column with a hash value and create a unique index on that.
You could use the built-in function md5() or something more efficient for huge values.

Here is a related solution with a focus on performance:
Index max row size error