I have the following file:
<?xml version="1.0" encoding="utf-8"?>
<!--Generated by crowdin.net-->
<string name="test" >- test</string>
<string name="test" >test-test</string>
<string name="test" >test - test</string>
and I would like to replace the en dash
with its unicode value, but not all of them, just the one in the string
tag
I run several sed
with different regex, but I couldn't figured it out. One of those was
sed -i.bak "s/-[^-\<\>0-9]/\–\;/g" strings.xml
the output was:
<?xml version="1.0" encoding="utf-8"?>
<!-–enerated by-->
<string name="test" >–test</string>
<string name="test2" >test–est</string>
<string name="test3" >test –test</string>
my problem is that is also replacing empty spaces and the first char of the second word. I have not that big experience with regex
and sed
. Could you please explain me what I am doing wrong?
Note: I'm using OSX.
Best Answer
With a recent (for
\K
ands///r
)perl
and assuming your<string>
tags don't nest:-0777
: slurp mode: handle the whole file at once (to allow<string>
tags to span several lines).-p
:sed
mode-i.bak
: in-place editing with.bak
extension (BTW, that's where somesed
implementations got that idea from)s{...}{...}ges
: substitute globally (g
), where.
matches newline characters as well (s
), and treat the replacement asperl
code to execute (e
).<string.*?>\K.*?</string>
: match from<string...>
to</string>
but don't include the tags themselves in the part that is matched (\K
defines where the matched portion starts, and(?=...)
is a look-ahead operator that only checks if</string>
is there, but doesn't include it in the match).$&=~s/.../.../rg
. Do the substitution on the matched part ($&
). Ther
flag is to actually not modify$&
but return the substituted string.