Replace UTF-8 characters with shell perl

perlunicode

How do I get perl to properly replace UTF-8 character from a shell?

The examples use stdin, but I need something that works for perl ... file too.

This is what I expect:

$ echo ABCæøåDEF | perl -CS -pe "s/([æøå])/[\\1]/g"
ABC[æ][ø][å]DEF

This is what I get:

$ echo ABCæøåDEF | perl -CS -pe "s/([æøå])/[\\1]/g"
ABCæøåDEF

Replacing the Unicode characters with ASCII works instantly:

$ echo ABC123DEF | perl -CS -pe "s/([123])/[\\1]/g"
ABC[1][2][3]DEF

My environment:

perl 5.18.2
Bash 3.2.57
LC_ALL=en_US.UTF-8
LANG=en_US.UTF-8

Best Answer

Use this :

 $ echo 'ABCæøåDEF' |
    perl -CSD -Mutf8 -pe 's/([æøå])/[$1]/g'

Works also for files

Output :

ABC[æ][ø][å]DEF

Note :

  • substitutions: \\1 is for , \1 is for and in we use $1
  • check perldoc perlrun for -CSD utf8 tricks