Ubuntu – Find and replace regular expression (matching unicode character class Devanagari) in multiple files

command lineperlregextext processing

Say you have a file named test.txt with the following lines:

ಕದಂಬ
कदम्ब

Then, suppose you want to replace each devanagarI unicode character (ie in कदम्ब) with a D. Then, you might think that the following would work:

find . -name 'test.*' | xargs perl -w -i -p -e 's/(\p{Devanagari})/D/g'

But it doesn't. How to accomplish this?

Best Answer

As suggested by steeldriver, you can force both STDIN/STDOUT to be UTF-8:

-C on its own (not followed by any number or option list), or the empty string "" for the PERL_UNICODE environment variable, has the same effect as -CSDL. In other words, the standard I/O handles and the default open() layer are UTF-8-fied but only if the locale environment variables indicate a UTF-8 locale.

find . -name 'test.*' 2>/dev/null | xargs perl -w -C -i -p -e 's/(\p{Devanagari})/D/g'

Will transform your sample file like this:

ಕದಂಬ
DDDDD

Source: http://perldoc.perl.org/perlrun.html#Command-Switches