How to replace Unicode Character in Notepad++

find and replace

I have got a .xlf files that looks like the picture below:

enter image description here

I wonder how do I search and replace unicode character "xE5" to "æ"
I thought I could search for:^0145 =xE5 and replace "æ", that did not work.

If this is not possible I could use another text editor (example ultraedit).

here is the pasted text from the file:

<?xml version="1.0" encoding="utf-8"?>
<xliff xmlns="urn:oasis:names:tc:xliff:document:1.2" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.2" xsi:schemaLocation="urn:oasis:names:tc:xliff:document:1.2 xliff-core-1.2-strict.xsd">
  <file xmlns:bind="http://bind.sorona.se" original="CTO12623_1_en-GB-da.xml" source-language="en" datatype="xml" date="2015-11-11T15:35:51Z" target-language="da" product-name="Anders_LP8504_151111" bind:file-id="78452" bind:file-hash="85075c54359fa47b087d6c67ec967f43">
    <header>
      <tool tool-name="Sorona TMS" tool-id="bind" tool-version="3.1.5" tool-company="Sorona Innovation" />
      <count-group name="word-count">
        <count count-type="total" unit="word">2743</count>
      </count-group>
    </header>
    <body>
      <trans-unit id="e1ca41ef868a74944745b8cd1dfa59e7" translate="yes" approved="no" restype="string" resname="p">
        <source>The trench compactor LP 8504 is a radio controlled trench compactor. It has a robust design and is suitable for compaction of medium to deep layers of cohesive and granular soils on limited areas such as trenches, construction back-fills and on roads. No other use is permitted.</source><seg-source><mrk mtype="seg" mid="1">The trench compactor LP 8504 is a radio controlled trench compactor. It has a robust design and is suitable for compaction of medium to deep layers of cohesive and granular soils on limited areas such as trenches, construction back-fills and on roads. No other use is permitted.</mrk></seg-source>
        <target state="translated"><mrk mtype="seg" mid="1">Vibrationstromlen LP 8504 er radiostyret. Den har et robust design og er beregnet til komprimering af middel til dybe lag af sammenh篧ende og granuleret jord p塢egr篳ede omr楥r s塳om gr?r, anl稳opfyldninger og p塶eje. Den m塩kke anvendes til andre form欮</mrk></target>
      </trans-unit>
      <trans-unit id="3b3dbf229f5f1f06ab9427d689c9740b" translate="yes" approved="no" restype="string" resname="p">
        <source>The LP trench compactor must only be used in well-ventilated areas, as is the case for all combustion engine machines.</source><seg-source><mrk mtype="seg" mid="2">The LP trench compactor must only be used in well-ventilated areas, as is the case for all combustion engine machines.</mrk></seg-source>
        <target state="translated"><mrk mtype="seg" mid="2">LP vibrationstromlen m塬ige som alle andre maskiner med forbr篤ingsmotorer kun bruges i godt ventilerede omr楥r.</mrk></target>
      </trans-unit>
      <trans-unit id="3ceced74b90bcbc582c1857395a8abf1" translate="yes" approved="no" restype="string" resname="p">
        <source>The LP trench compactor must not be towed behind vehicles.</source><seg-source><mrk mtype="seg" mid="3">The LP trench compactor must not be towed behind vehicles.</mrk></seg-source>
        <target state="translated"><mrk mtype="seg" mid="3">LP vibrationstromlen m塩kke sl磥s efter biler.</mrk></target>
      </trans-unit>
      <trans-unit id="c1ff7c8ab3ea4123fc2d5fb6a105d98b" translate="yes" approved="no" restype="string" resname="p">
        <source>Handbrake</source><seg-source><mrk mtype="seg" mid="4">Handbrake</mrk></seg-source>
        <target state="translated"><mrk mtype="seg" mid="4">H毤bremse</mrk></target>
      </trans-unit>
    </body>
  </file>
</xliff>

I have also attached the xlf file here is a link:
Here is link to download the xlf

Any suggestions?

Best Answer

I wonder how do I search and replace unicode character xE5" with æ

Note that æ is actually Unicode 00E6 not 00E5.

Search and replace is not the right way to get the correct characters displayed.

<?xml version="1.0" encoding="utf-8"?>

The above states the encoding is utf-8 but the file is actually encoded as ANSI.

You need to convert the file correctly to UTF-8, as follows:

  1. Open Testfile.xlf

  2. File looks like:

    enter image description here

    Unicode is incorrectly displayed.

  3. Menu > Encoding > Select Encode in ANSI

    enter image description here

  4. File looks like:

    enter image description here

    Unicode is correctly displayed.

  5. Select all file contents (ctrl+a)

  6. Menu > Encoding > Select Convert to UTF-8

    enter image description here

  7. Save the File (ctrl+s)

  8. Close and reopen.

  9. File is now correctly encoded as UTF-8 and Unicode characters display correctly.


How can you see the file is actually ANSI?

The cygwin file utility shows this (before and after conversion):

DavidPostill@Hal /f/test
$ file -i Testfile*.xlf
Testfile.xlf:          application/xml; charset=iso-8859-1
TestfileConverted.xlf: application/xml; charset=utf-8
Related Question