so, I have files with text formatted like this:
untranslatedString : "translated string",
and I need to replace characters in "translated string" part with their Cyrillic representation. I use something like this:
paste <(sed 's/\([^:]\+:\)\([^:]\+\)/\1/' resources.js) <(sed 's/[^:]\+:\([^:]\+\)/\1/;y/abc/абц/' resources.js)
(abc/абц/ part is actually longer and includes all characters, this is for illustrative purposes).
problem arises in lines like this one:
abcTestString : "abc {ccb} bbc",
everything between {} should be left in it's original state, ie. character shouldn't be replaced. result should be:
abcTestString : "aбц {ccb} ббц",
and not
abcTestString : "aбц {ццб} ббц",
Also, there can be multiple {} parts per line.
How can I do that?
Best Answer
If you are okay with using
perl
-Mopen=locale -Mutf8
unicode settings (thanks to this wonderful answer tr analog for unicode characters?)-F: -lane
use:
as field separator, saved in@F
array (See https://perldoc.perl.org/perlrun.html#Command-Switches for other options)$F[-1]
last field of@F
array\{[^{}]+\}(*SKIP)(*F)|[a-z]+
here we say that[a-z]+
portion has to match but\{[^{}]+\}
should be left as it is$&=~tr|abc|абц|r
perform transliteration for the matched portionge
theg
modifier for replacing all matches,e
modifier to allow the Perl code in replacement sectionIf this is too big a code to handle from command line, change it to a program