This is a follow-up question to a previous thread I had posted. We're in the process of migrating to a new character set. Now, I am a programmer and we have DBA's who will be performing the work but I'd like to get a little bit more information about the process. Our current character set is WE8ISO8859P1 and uses BYTE character semantics. However, we are migrating to AL32UTF8. A new database was created containing the AL32UTF8 character set using BYTE semantics. The DBA ran a script on some tables that checked to see if there would be any issues (I believe it was CSALTER). After running the script, it said that some of the rows in our table would be truncated. That table contains a VARCHAR2(BYTE) column that's set to 4000 BYTES. The rows that resulted with potential issues all contain special characters. My question is, how can we export/import the tables in our database without having information truncated? Why is it saying that it would be truncated? My first thought was that since we changed the character set, it may be storing some of the characters from the old database table different than when it's stored into the new database with the character set using up more storage.
Character Set Database Migration
character-setmigrationoracle
Related Question
- Sql-server – SQL Server to MySQL migration – how can I remove UCS-2 surrogate pairs in SQL Server
- Oracle Character set identification
- Mysql – Master and slave tables with different character sets – will replication break
- Why is there a difference in the character limit for VARCHAR2 in SQL and PL/SQL
- Measuring Character Set Conversion
- Sql-server – Byte ordering for multibyte characters in SQL Server versus Oracle
Best Answer
The reason for the truncation is quite simple. Some characters (accented ones, for example) in the WE8ISO8859P1 character set are stored as a single byte, but in AL32UTF8 they end up being stored as multiple bytes. As a result of conversion, a 4000 character string may end up actually requiring more than 4000 bytes.
By way of example, this query shows you that the Euro symbol (0x80 in WE8ISO8859P1) becomes 2 bytes in AL32UTF8:
To list all characters that will be affected by the change, you can use the following query:
Unfortunately the maximum length a
CHAR
orVARCHAR
string can be in Oracle is 4000 bytes. The only option available to you if characterset conversion pushes you over this limit is to convert the columns to use theCLOB
datatype, but we warned -CLOB
s are difficult to deal with and can present challenges.