What would be the most efficient way, within Postgres, to truncate a text
field to a maximum number of bytes, in such a way that the encoding is preserved?
I.e. how to keep at most N bytes while keeping a valid UTF8 string representation.
E.g. assuming UTF8, if a field contains abc€
, that's 4 characters for 5 bytes, if I want to truncate this field to at most 4 bytes, but keep a valid UTF8 string, I'd need to actually keep only the first 3 bytes.
I feel like I need the following, but instead of 'escape'
, I'd like to ignore the invalid bytes to only keep abc
:
select encode(substring('abc€'::bytea, 0, 5), 'escape');
encode
---------
abc\342
What's the best way to achieve this?
Best Answer
Use this function:
It works like this: