Ignore part of the string with sed

replacesedtext processing

so, I have files with text formatted like this:

untranslatedString : "translated string",

and I need to replace characters in "translated string" part with their Cyrillic representation. I use something like this:

paste <(sed 's/\([^:]\+:\)\([^:]\+\)/\1/' resources.js) <(sed 's/[^:]\+:\([^:]\+\)/\1/;y/abc/абц/' resources.js)

(abc/абц/ part is actually longer and includes all characters, this is for illustrative purposes).

problem arises in lines like this one:

abcTestString : "abc {ccb} bbc",

everything between {} should be left in it's original state, ie. character shouldn't be replaced. result should be:

abcTestString : "aбц {ccb} ббц",

and not

abcTestString : "aбц {ццб} ббц",

Also, there can be multiple {} parts per line.

How can I do that?

Best Answer

If you are okay with using perl

$ s='abcTestString : "abc {ccb} bbc",'
$ echo "$s" | perl -Mopen=locale -Mutf8 -F: -lane '
               $F[-1]=~s/\{[^{}]+\}(*SKIP)(*F)|[a-z]+/$&=~tr|abc|абц|r/ge;
               print join ":",@F'
abcTestString : "абц {ccb} ббц",
  • -Mopen=locale -Mutf8 unicode settings (thanks to this wonderful answer tr analog for unicode characters?)
  • -F: -lane use : as field separator, saved in @F array (See https://perldoc.perl.org/perlrun.html#Command-Switches for other options)
  • $F[-1] last field of @F array
  • \{[^{}]+\}(*SKIP)(*F)|[a-z]+ here we say that [a-z]+ portion has to match but \{[^{}]+\} should be left as it is
  • $&=~tr|abc|абц|r perform transliteration for the matched portion
  • ge the g modifier for replacing all matches, e modifier to allow the Perl code in replacement section


If this is too big a code to handle from command line, change it to a program

$ echo "$s" | perl -MO=Deparse -Mopen=locale -Mutf8 -F: -lane '
               $F[-1]=~s/\{[^{}]+\}(*SKIP)(*F)|[a-z]+/$&=~tr|abc|абц|r/ge;
               print join ":",@F'
BEGIN { $/ = "\n"; $\ = "\n"; }
use open (split(/,/, 'locale', 0));
use utf8;
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    our @F = split(/:/, $_, 0);
    $F[-1] =~ s[\{[^{}]+\}(*SKIP)(*F)|[a-z]+][use utf8 ();
    $& =~ tr/abc/\x{430}\x{431}\x{446}/r;]eg;
    print join(':', @F);
}
Related Question