Datapump metadata-only import changes datalength when importing with conversion

data conversionoracleunicode

Datapump metadata only import changes datalength when importing with conversion: (export done in WE8ISO8859P15 and import done in AL32UTF8 character set and AL16UTF16 NCHAR character set)

Example table:
Source-SYSTEM (WE8ISO8859P15):

select OWNER,TABLE_NAME,COLUMN_NAME,DATA_LENGTH from dba_tab_columns where TABLE_NAME='STRING';

OWNER TABLE_NAME COLUMN_NAME DATA_LENGTH
--------------- -------------------- -------------------- -----------
SCHEMA_NAME STRING DE 2000

Target-SYSTEM (AL16UTF16):

select OWNER,TABLE_NAME,COLUMN_NAME,DATA_LENGTH from dba_tab_columns where TABLE_NAME='STRING';

OWNER TABLE_NAME COLUMN_NAME DATA_LENGTH
--------------- -------------------- -------------------- -----------
SCHEMA_NAME STRING DE 4000

The data length of varchar2(char 2000) has been changed to varchar2(char 4000) automatically. What is the rule set for this behavior? Is this a documented behavior?

Edit: The issue seem to be not related to datapump but to the conversion from single character to multicharacter set:

Source-SYSTEM (WE8ISO8859P15):

create table test(name varchar2(50), name2 varchar2(5 char), name3 clob);

INSERT INTO test VALUES('Susanne','Test','Hi This is Row one');

select owner,TABLE_NAME, COLUMN_NAME, DATA_LENGTH, CHAR_LENGTH, data_type,char_used FROM ALL_TAB_COLUMNS where TABLE_NAME='TEST';
SCHEMA_NAME TEST    NAME    50  50  VARCHAR2    B
SCHEMA_NAME TEST    NAME2   5   5   VARCHAR2    C
SCHEMA_NAME TEST    NAME3   4000    0   CLOB    

Target-SYSTEM (AL16UTF16):

create table test(name varchar2(50), name2 varchar2(5 char), name3 clob);

INSERT INTO test VALUES('Susanne','Test','Hi This is Row one');

select owner,TABLE_NAME, COLUMN_NAME, DATA_LENGTH, CHAR_LENGTH, data_type,char_used FROM ALL_TAB_COLUMNS where TABLE_NAME='TEST';
SCHEMA_NAME TEST    NAME    50  50  VARCHAR2    B
SCHEMA_NAME TEST    NAME2   20  5   VARCHAR2    C
SCHEMA_NAME TEST    NAME3   4000    0   CLOB    

Note: In old DB (character set = WE8ISO8859P15)
DATA_LENGTH=1*CHAR_LENGTH for all columns having VARCHAR2 DATA_TYPE (indipendent of CHAR_USED).

In new DB (with UNICODE character set AL32UTF8)
DATA_LENGTH=4*CHAR_LENGTH (or max-value = 4000) for all columns having VARCHAR2 DATA_TYPE and CHAR_USED=C.

Best Answer

If we create VARCHAR2(5) or VARCHAR2(5 BYTE), we put constraint on max 5 bytes.

If we create VARCHAR2(5 CHAR), we put constraint on max 5 characters.

In WE8ISO8859P15 every character consists of 1 byte, so VARCHAR2(5) is equivalent to VARCHAR2(5 CHAR) and DATA_LENGTH = CHAR_LENGTH.

In AL16UTF16 every character consists of 2 bytes and in all cases DATA_LENGTH = 2 * CHAR_LENGTH.

In AL32UTF8 character may consist of 1-4 bytes. So VARCHAR2(5) and VARCHAR2(5 BYTE) will put constraint on max 5 bytes (five 1-byte characters will fit, two 4-byte characters will not fit). And VARCHAR2(5 CHAR) will put constraint on max 5 characters, so in worst scenario five 4-byte characters will fit and DATA_LENGTH will be 4 * CHAR_LENGTH.