Ubuntu – Find and replace regular expression (matching unicode character class Devanagari) in multiple files

command lineperlregextext processing

Say you have a file named test.txt with the following lines:

ಕದಂಬ
कदम्ब

Then, suppose you want to replace each devanagarI unicode character (ie in कदम्ब) with a D. Then, you might think that the following would work:

find . -name 'test.*' | xargs perl -w -i -p -e 's/(\p{Devanagari})/D/g'

But it doesn't. How to accomplish this?

Best Answer

As suggested by steeldriver, you can force both STDIN/STDOUT to be UTF-8:

-C on its own (not followed by any number or option list), or the empty string "" for the PERL_UNICODE environment variable, has the same effect as -CSDL. In other words, the standard I/O handles and the default open() layer are UTF-8-fied but only if the locale environment variables indicate a UTF-8 locale.

find . -name 'test.*' 2>/dev/null | xargs perl -w -C -i -p -e 's/(\p{Devanagari})/D/g'

Will transform your sample file like this:

ಕದಂಬ
DDDDD

Source: http://perldoc.perl.org/perlrun.html#Command-Switches

Related Solutions

Ubuntu – Using text list to batch-rename files

This looks like a job for xargs.

If your file is formatted like this:

old_file1 new_file1
old_file2 new_file2

then you can do xargs -a your_file -n 2 mv.

Ubuntu – How to replace multiple lines with single word in file(inplace replace)

This can be done very easily in perl:

$ perl -i -p0e 's/START.*?END/SINGLEWORD/s' file
$ cat file
My block of line starts from here 
SINGLEWORD
and end to here for example.

Explanation

-0 sets the line separator to null

-p apply the script given by -e to each line and print that line

The regexp modifier:

/s Treat string as single line. That is, change . to match any character whatsoever, even a newline, which normally it would not match.

Why the ?:

By default, a quantified subpattern is "greedy", that is, it will match as many times as possible (given a particular starting location) while still allowing the rest of the pattern to match. If you want it to match the minimum number of times possible, follow the quantifier with a ?.

Best Answer

Related Solutions

Ubuntu – Using text list to batch-rename files

Ubuntu – How to replace multiple lines with single word in file(inplace replace)

Related Question