UTF-8 – Causes of Special Characters After Conversion

conversionnotepadultraeditutf-8

One of our work steps involves saving an MS Excel worksheet as CSV and then using UltraEdit to convert the CSV to UTF-8 before importing it into a server system.

The problem is that, after the conversion to UTF-8, the file always contains 3 nonsense characters at the start of the file:

ENTITY_ID;FIELD2;FIELD3,FIELD4;(etc.)
value1;value2;value3;value4;(etc).

Observations:

  • As you can see, there are 3 characters that are noise and cause the server to reject the CSV import because the first column is not named "ENTITY_ID". The characters are always the same.

  • These characters are not shown after the conversion, but when we close and reopen the file in UltraEdit again, then we do see the characters.

  • These characters are only visible in UltraEdit. Windows Notepad or Notepad++ does not show them.

  • Using Notepad++ to convert the CSV to UTF-8 produces the exact same output: a file with the same 3 odd characters in the beginning. The only difference is that Notepad++ does not display these characters, even after closing and reopening the file.

Workaround:
We reopen the file in UltraEdit, delete the noise, and then the server accepts the CSV import.
This step needs to be eliminated by fixing the actual problem.

Question: How can we avoid these 3 characters?

Best Answer

That's the byte order mark, encoded as UTF-8. Tell your editor to not add it at the beginning, or use a real decoder in your server system.

Related Question