Ignore part of the string with sed

replacesedtext processing

so, I have files with text formatted like this:

untranslatedString : "translated string",

and I need to replace characters in "translated string" part with their Cyrillic representation. I use something like this:

paste <(sed 's/\([^:]\+:\)\([^:]\+\)/\1/' resources.js) <(sed 's/[^:]\+:\([^:]\+\)/\1/;y/abc/абц/' resources.js)

(abc/абц/ part is actually longer and includes all characters, this is for illustrative purposes).

problem arises in lines like this one:

abcTestString : "abc {ccb} bbc",

everything between {} should be left in it's original state, ie. character shouldn't be replaced. result should be:

abcTestString : "aбц {ccb} ббц",

and not

abcTestString : "aбц {ццб} ббц",

Also, there can be multiple {} parts per line.

How can I do that?

Best Answer

If you are okay with using perl

$ s='abcTestString : "abc {ccb} bbc",'
$ echo "$s" | perl -Mopen=locale -Mutf8 -F: -lane '
               $F[-1]=~s/\{[^{}]+\}(*SKIP)(*F)|[a-z]+/$&=~tr|abc|абц|r/ge;
               print join ":",@F'
abcTestString : "абц {ccb} ббц",

-Mopen=locale -Mutf8 unicode settings (thanks to this wonderful answer tr analog for unicode characters?)
-F: -lane use : as field separator, saved in @F array (See https://perldoc.perl.org/perlrun.html#Command-Switches for other options)
$F[-1] last field of @F array
\{[^{}]+\}(*SKIP)(*F)|[a-z]+ here we say that [a-z]+ portion has to match but \{[^{}]+\} should be left as it is
$&=~tr|abc|абц|r perform transliteration for the matched portion
ge the g modifier for replacing all matches, e modifier to allow the Perl code in replacement section

If this is too big a code to handle from command line, change it to a program

$ echo "$s" | perl -MO=Deparse -Mopen=locale -Mutf8 -F: -lane '
               $F[-1]=~s/\{[^{}]+\}(*SKIP)(*F)|[a-z]+/$&=~tr|abc|абц|r/ge;
               print join ":",@F'
BEGIN { $/ = "\n"; $\ = "\n"; }
use open (split(/,/, 'locale', 0));
use utf8;
LINE: while (defined($_ = <ARGV>)) {
    chomp $_;
    our @F = split(/:/, $_, 0);
    $F[-1] =~ s[\{[^{}]+\}(*SKIP)(*F)|[a-z]+][use utf8 ();
    $& =~ tr/abc/\x{430}\x{431}\x{446}/r;]eg;
    print join(':', @F);
}

Related Solutions

How to Read Needle Part of Sed Command from a File

You can have the shell expand the file's contents before passing them to sed:

sed -e "s/$(cat needle.txt)/replace/" subject.txt

Note the use of double quotes.

This will make sed interpret any regex metacharacters from needle.txt as regex metacharacters and not ordinary characters. It will break if needle.txt contains a /.

In case you want the lines of needle.txt to be interpreted literally (even if they contain regex metacharacters as in your example), you can do something like:

perl -pe '
    BEGIN{ local $/; 
           open $IN,"<","needle.txt";
           $needle = <$IN>
    }
    s/\Q$needle/replace/
'  subject.txt

Explanation

The -pe switches mean apply the code that follows line by line to the lines of the subject.txt file and print each line after you're done processing it.
The BEGIN{} segment is only executed once. What it does is it opens the needle.txt file and stores all of its contents in the $needle variable.
s/\Q$needle/replace/ is the same syntax you'd expect from sed except that \Q causes Perl's regex engine to treat everything after it as a fixed string rather than a regex.

Best Answer

Related Solutions

How to Read Needle Part of Sed Command from a File

Related Question