Character Set Database Migration

character-setmigrationoracle

This is a follow-up question to a previous thread I had posted. We're in the process of migrating to a new character set. Now, I am a programmer and we have DBA's who will be performing the work but I'd like to get a little bit more information about the process. Our current character set is WE8ISO8859P1 and uses BYTE character semantics. However, we are migrating to AL32UTF8. A new database was created containing the AL32UTF8 character set using BYTE semantics. The DBA ran a script on some tables that checked to see if there would be any issues (I believe it was CSALTER). After running the script, it said that some of the rows in our table would be truncated. That table contains a VARCHAR2(BYTE) column that's set to 4000 BYTES. The rows that resulted with potential issues all contain special characters. My question is, how can we export/import the tables in our database without having information truncated? Why is it saying that it would be truncated? My first thought was that since we changed the character set, it may be storing some of the characters from the old database table different than when it's stored into the new database with the character set using up more storage.

Best Answer

The reason for the truncation is quite simple. Some characters (accented ones, for example) in the WE8ISO8859P1 character set are stored as a single byte, but in AL32UTF8 they end up being stored as multiple bytes. As a result of conversion, a 4000 character string may end up actually requiring more than 4000 bytes.

By way of example, this query shows you that the Euro symbol (0x80 in WE8ISO8859P1) becomes 2 bytes in AL32UTF8:

SQL>  select length(convert(chr(128),'AL32UTF8','WE8ISO8859P1')) from dual
  2  /
LENGTH(CONVERT(CHR(128),'AL32UTF8','WE8ISO8859P1'))
---------------------------------------------------
                                                  2
SQL>

To list all characters that will be affected by the change, you can use the following query:

with n as 
(
select level as c from dual
connect by level <= 255
)
select c as "WE8ISO8859P1 value",
       '"'||chr(c)||'"' as Character, 
       length(convert(chr(c),'AL32UTF8','WE8ISO8859P1')) as "New length"
from n
where length(convert(chr(c),'AL32UTF8','WE8ISO8859P1'))>1;

Unfortunately the maximum length a CHAR or VARCHAR string can be in Oracle is 4000 bytes. The only option available to you if characterset conversion pushes you over this limit is to convert the columns to use the CLOB datatype, but we warned - CLOBs are difficult to deal with and can present challenges.