PostgreSQL – How to Translate Special Characters

postgresqlunaccent

Description:

PostgreSQL 9.3

String: 'ì ằ ú ề'

Desired Result: 'i a u e'

My code:

select translate ('ì ằ ú ề', 'ìằúề', 'iaue') ; -- it works. Result: i a u e

Question:

If I use it this way, I have to define a manual translation between 'ìằúề' and 'iaue'. Is there a better solution?

Ref: PG Document

Best Answer

Use unaccent() provided by the additional module unaccent.
Details:

Substantial improvements went into Postgres 9.5:

  • Allow empty replacement strings in unaccent (Mohammad Alhashash)

    This is useful in languages where diacritic signs are represented as separate characters.

  • Allow multicharacter source strings in unaccent (Tom Lane)

    This could be useful in languages where diacritic signs are represented as separate characters. It also allows more complex unaccent dictionaries.

Custom rules

If you edit the default rules (like you commented) any updates (or new compilations if you compile yourself) will overwrite your manual changes. Be sure to make a backup of both the original and your edited ruleset immediately.

The unaccent() function is overloaded. While the simple form uses the default dictionary 'unaccent':

SELECT unaccent('ì ằ ú ề');

You can chose a separate dictionary with the 2-parameter form:

SELECT unaccent('my_dictionary', 'ì ằ ú ề');

I would consider not to alter the default ruleset (like instructed in the link in your comment), but install your ruleset additionally under a custom name. This way you keep the default behavior, which receives all updates automatically, and you can clearly see in your code where you apply your alternative / extended ruleset. Follow instructions in the manual or consider this blog.

And consider my link above for use in indexes or other contexts where you need the function to be immutable.