Windows – How to convert html entities in Oracle

character-setoracleoracle-11gwindowsxml

I'm trying to decode html containing html entities.
I've tried dbms_xmlgen.convert and utl_i18n.unescape_reference but the results are less than satisfactory.

SET ESCAPE ON;
SELECT
   'dbms_xmlgen.convert' AS method,
   dbms_xmlgen.convert('\♥', 1) AS hearts,
   dbms_xmlgen.convert('\&',    1) AS amp_ent,
   dbms_xmlgen.convert('\&',  1) AS amp_dec,
   dbms_xmlgen.convert('\&', 1) AS amp_hex,
   dbms_xmlgen.convert('\激\光', 1) AS chinese_laser 
FROM dual
UNION ALL
SELECT
   'utl_i18n.unescape_reference',
   utl_i18n.unescape_reference('\♥'),
   utl_i18n.unescape_reference('\&'),
   utl_i18n.unescape_reference('\&'),
   utl_i18n.unescape_reference('\&'),
   utl_i18n.unescape_reference('\激\光') 
FROM dual;

The results I get are:

METHOD                        HEARTS      AMP_ENT     AMP_DEC     AMP_HEX     CHINESE_LASER
----------------------------------------------------------------------------------------------
dbms_xmlgen.convert           ♥    &           &     &    激光
utl_i18n.unescape_reference   ¿           &           &           &           ¿¿

My actual problem involves Chinese characters, processed by a Java program to create PDF reports. I have no easy access to the Java code, but I do have control over the query used by the program.

An example of Chinese characters that I use for testing is 激光, which Google Translate tells me means 'laser' and which I receive encoded as 激光. These aren't decoded properly, as can be seen in the example above.

I realise that in the second row, the inverted question marks seem to indicate that the entities were converted, but can't be displayed properly. However, is this Oracle itself doing this, or the client (I tried in both SQL+ and Toad)? When I plug utl_i18n.unescape_reference into the query used by the Java program, it works for entities like ± (±), but again, not for the Chinese characters.

How can I have all entities properly decoded?

  • Should I use another function? (These were recommended somewhere on the Internet).
  • Should I change some settings? (Relevant settings shown below).

Relevant information

Oracle Database 11g Enterprise Edition Release 11.1.0.7.0 - 64bit Production
SQL*Plus: Release 10.1.0.5.0

NLS_LANGUAGE                           AMERICAN
NLS_TERRITORY                          AMERICA
NLS_CHARACTERSET                       WE8MSWIN1252
NLS_NCHAR_CHARACTERSET                 AL16UTF16

Best Answer

The chinese characters are decoded correctly with utl_i18n.unescape_reference. In fact they just aren't displayed properly in your query result which might not support those special characters.

You can confirm that with this SQL Fiddle.

It is the client which is respsonsible to display the characters appropriately. If the client can not display a character it may show an upside down question mark, something else or just plain garbage.

So by now, the real question is where do you want to display this string...