Measuring Character Set Conversion

oracleoracle-12c

We have an Oracle 12c (12.2.0.1) database server set up with AL32UTF8 (Unicode 5.0 UTF-8 Universal character set) character set as per Oracle recommendations.

However, the application that uses this database comes with a vendor recommendation to use WE8MSWIN1252 (MS Windows Code Page 1252 8-bit West European) character set instead.

I know, at least in general terms, how to convert from one character set to another, using either the Data Migration Assistant for Unicode or via a full Export/Import.

But there are two areas that I'm not clear on.

One is that Oracle says there is a potential for data truncation issues (and it explains how/why). Is there any way to find out if that's an issue within my database, short of converting the database in test and trying to do a row-by-row comparison between the converted and non-converted data?

More importantly, since the vendor wants us in a character set that's not the Oracle-recommended default, is there any way to measure what, if any, performance impacts the conversion may have? From what I can tell, the server is having to convert incoming data from the WE8MSWIN1252 character set to AL32UTF8 on inserts/updates and reversing that on selects, right? Are there any views that would let me see how much time the database is spending handling these conversions? I really can't imagine the performance hit is sufficient to justify the conversion, but I'd like to know for sure.

Best Answer

The old csscan used to do this, but have you actually run DMA? DMA does a scan of your whole database and then lists all the tables that will suffer a lossy conversion. This is a preview before it actually goes ahead and does the conversion. You will have a higher chance of lossy data if you go from a multi-byte character set like AL32UTF8 to a single-byte character set like WE8MSWIN1252. There are many reasons why you would have lossy data, and part of these are due to databases being accessed by multiple applications and tools on multiple operating systems, with locale/NLS_LANG set correctly or not. If NLS_LANG is set correctly then oracle does single-pass conversions between source and target database. This depends on the application though.

Oracle does state that there is a performance overhead (maybe 5%) on using a multi-byte character sets, but specifies AL32UTF8 as the default as that covers all character sets world-wide. I don't know of any view that calculates conversion times; that's part of the RDBMS.

We personally have chosen WE8MSWIN1252 on our databases for two reasons:

  1. It was going to take too long to convert from US7ASCII to AL32UTF8
  2. It covers most European character sets and we have separate AL32UTF8 databases in our eastern countries

Hope that helps!