Oracle DB export, possible charset conversion if NLS_LANG not set

character-setexportoracle

I need to export an Oracle database from one server and import it on another server. But I am getting the following information message at the beginning:

Connected to: Oracle Database 10g Release 10.2.0.4.0 – 64bit
Production Export done in US7ASCII character set and AL16UTF16 NCHAR
character set server uses UTF8 character set (possible charset
conversion)

Although the export still terminates successfully without warning, I was wondering whether the data in the database is still correct. Is there a need to set the NLS_LANG variable to match the character set of the server (in this case UTF-8)?

If I set the NLS_LANG variable to UTF-8 I don't get this information message. But the target server where I import the database again shows the following:

import done in UTF8 character set and AL16UTF16 NCHAR character set
import server uses AL32UTF8 character set (possible charset
conversion)

If I understand that correctly these two servers use (slightly) different character sets – UTF-8 and AL32UTF8.

I am confused on what I should set the NLS_LANG variable or even whether it is necessary to set. Should I use AL32UTF8 as NLS_LANG for the export and import?

EDIT: In case this information is important: I am using a SUSE Linux and for the export I use the command exp that comes with Oracle.

Best Answer

First of all, character set UTF-8 does not exist on Oracle, use AL32UTF8 or UTF8 (without the hyphen).

Usually you should get an error when your client character set is UTF-8:

$ setenv NLS_LANG AMERICAN_AMERICA.UTF-8
$ sqlplus ...

ERROR:
ORA-12705: Cannot access NLS data files or invalid environment specified

Character set UTF8 is identical to AL32UTF8 for characters of BMP (Basic Multilingual Plane), i.e. the first 65536 characters of Unicode. Most likely all characters in your application fall in this range, so it does not make any difference. AL32UTF8 is what is commonly called UTF-8 encoding. Oracle character set UTF8 is commonly known as CESU-8.

If you do not set any NLS_LANG variable then Oracle defaults it to AMERICAN_AMERICA.US7ASCII. Setting your client character set to US7ASCII is like you tell the Oracle Database server: "My client is able to display (only) 7-bit ASCII characters and files are saved in 7-bit ASCII format." Thus the database will replace all russian/cyrillic characters to ? or ¿ at export.

Set your NLS_LANG to AL32UTF8 (no matter which database character sets you have in place), then you will get any characters properly exported and imported.

For more information check also this document: NLS_LANG FAQ