Postgresql – Would index lookup be noticeably faster with char vs varchar when all values are 36 chars

indexperformancepostgresqlvarchar

I have a legacy schema (disclaimer!) that uses a hash-based generated id for the primary key for all tables (there are many). An example of such an id is:

922475bb-ad93-43ee-9487-d2671b886479

There is no possible hope of changing this approach, however performance with index access is poor. Setting aside the myriad of reasons this might be, there is one thing I noticed that seemed less than optimal – despite all id values in all many tables being exactly 36 characters in length, the column type is varchar(36), not char(36).

Would altering the column types to fixed length char(36) offer any significant index performance benefits, beyond the very small increase in the number of entries per index page etc?

Ie does postgres perform much faster when dealing with fixed-length types than with variable length types?

Please don't mention the minuscule storage saving – that's not going be matter compared with the surgery required to make the change to the columns.

Best Answer

No. No gain at all. The manual explicitly states:

Tip: There is no performance difference among these three types, apart from increased storage space when using the blank-padded type, and a few extra CPU cycles to check the length when storing into a length-constrained column. While character(n) has performance advantages in some other database systems, there is no such advantage in PostgreSQL; in fact character(n) is usually the slowest of the three because of its additional storage costs. In most situations text or character varying should be used instead.

Bold emphasis mine.

char(n) is a largely outdated, useless type. Stick with varchar(n). Without need to enforce a maximum length, varchar or text are a tiny bit faster. (You won't be able to measure a difference.)

If all strings are exactly 36 characters in length, there is no storage saving either way, not even a minuscule one. Both have exactly the same size on disk and in RAM. You can test with pg_column_size() (on an expression and on a table column).

And if all strings must have 36 characters, rather make it text with a CHECK (length(col) = 36) constraint enforcing exact length, not varchar(36) only enforcing max. length.

Any downsides of using data type “text” for storing strings?

You didn't ask for other options, but I'll mention two:

COLLATION - unless you are running your DB with the "C" collation. Collation is often overlooked and possibly expensive. Since your strings don't seem to be meaningful in a natural language, there is probably no point in following COLLATION rules. Related:
- How do I efficiently get "the most recent corresponding row"?
- EXECUTE within function not using index?
Extensive benchmark comparing (among other) the effect of COLLATE "C" on performance:
- Slow query ordering by a column in a joined table
UUID, obviously. Your string suspiciously looks like a UUID (32 hex digits plus 4 delimiters). It would be much more efficient to store these as actual uuid data type, which is faster in multiple ways and only occupies 16 bytes - as opposed to 37 bytes in RAM for either char(36) or varchar(36) (stored without delimiters, just the 32 defining char), or 33 bytes on disk. But alignment padding would result in 40 bytes either way in many cases.) COLLATION would be irrelevant for the uuid data type, too.
```
SELECT '922475bb-ad93-43ee-9487-d2671b886479'::uuid
```
This may be helpful (last chapters):
- Convert hex in text representation to decimal number
See also:
- What is the overhead for varchar(n)?
- What is the optimal data type for an MD5 field?

Best Answer

Related Solutions

Postgresql – Indexing to improve performance of range queries

Sql-server – Will an index that is a partial key of the begining of another index perform any better than the longer index

Related Question