Sql-server – the SQL Server equivalent for Oracle character set AL32UTF8

azure-sql-managed-instancecollationdatatypesoraclesql server

While porting data from oracle to Azure sql database I am facing the following issue where, empty data in Oracle is converted as ➞➞➞ in SQL Server

Pic1: Data as in Oracle DB;

Screenshot data of Oracle Database

Pic2: Data as in Sql server DB;

Screenshot data of SQL Server Database

Pic3: Data when pasted in Notepad++.

Screenshot of the oracle data when pasted in Notepad++

The character set in Oracle is AL32UTF8 and that of target SQL Server database is SQL_Latin1_General_CP1_CI_AS. I have tried changing the datatype to NVARCHAR / NCHAR and changing the collation of the column to SQL_Latin1_General_CP850_CI_AS, but no solution. What is the reason for this difference and is there any solution?

Best Answer

NVARCHAR is always UTF-16.

You can try Latin1_General_100_CI_AS_SC_UTF8 with a VARCHAR column, but that doesn't look like an encoding problem. It really shouldn't matter what the source encoding / collation was as long as you correctly indicate the encoding used to transfer the data.

Either way, those 3 bytes / characters could be a bug in the import tool/process or the export tool/process. How are you transferring the data? If the image you are showing from Notepad++ is from the file being used to migrate the data and the field is supposed to be empty, either the export messed up, or perhaps the field in Oracle is not truly empty.