CONVERT() returns garbage with CLOBs in Oracle

character-setoracleutf-8

I am looking to repair a table of data in Oracle, ideally via unprivileged SQL, which has had UTF-8 data inserted into a UTF-8 database, using the Latin-1 character set by mistake.

The symbol β GREEK SMALL LETTER BETA should have gone into the database, but instead the two characters β have gone in…as the two UTF-8 characters
Î LATIN CAPITAL LETTER I WITH CIRCUMFLEX followed by ² SUPERSCRIPT TWO.

This example code demonstrates the problem and the fix, however it only works with VARCHAR columns. As soon as a CLOB is used, the conversion fails:

-- This must return AL32UTF8 for this example to be valid
SELECT VALUE FROM NLS_DATABASE_PARAMETERS WHERE PARAMETER='NLS_CHARACTERSET';

CREATE TABLE EXAMPLE (T VARCHAR2(20));
INSERT INTO EXAMPLE (T) VALUES ('Example β');
SELECT T FROM EXAMPLE; -- Should return 'Example β'
SELECT CONVERT(T, 'WE8ISO8859P1') FROM EXAMPLE;
UPDATE EXAMPLE SET T=CONVERT(T, 'WE8ISO8859P1');
SELECT T FROM EXAMPLE; -- Should return 'Example β', problem fixed
DROP TABLE EXAMPLE;

However if the VARCHAR2(20) is changed to CLOB then this no longer works. CONVERT() returns garbage characters. I can use TO_CHAR() to work around the problem, but eventually I get an error that the CLOB is longer than 4000 chars so TO_CHAR() fails.

Is there a way to get the above example working, when using a CLOB column that's longer than 4000 chars?

Best Answer

DBMS_LOB package has conversion functions as well. Unfortunately there is no support for converting CLOB to CLOB and change characterset in one step, so the data is first converted to BLOB, then back to CLOB.

CREATE TABLE EXAMPLE (T CLOB);
INSERT INTO EXAMPLE (T) VALUES ('Example β ');

begin
  for i in 1..12
  loop
    update example set t = t || t;
  end loop;
  commit;
end;
/

select dbms_lob.getlength(t) from example;

DBMS_LOB.GETLENGTH(T)
---------------------
                45056

Now the conversion part:

create or replace function repair_data (p_clob clob) return clob as
  l_blob blob; 
  l_dest_offset number := 1;
  l_src_offset number := 1;
  l_lang_context number := 0;
  l_warning number;
  l_result clob;
begin
  dbms_lob.createtemporary(l_blob, true, dbms_lob.call);
  dbms_lob.createtemporary(l_result, true, dbms_lob.call);
  dbms_lob.converttoblob(l_blob, p_clob, dbms_lob.lobmaxsize, l_dest_offset, l_src_offset,
                         nls_charset_id('WE8ISO8859P1'), l_lang_context, l_warning);
  l_dest_offset := 1;
  l_src_offset := 1;
  l_lang_context := 0;
  dbms_lob.converttoclob(l_result, l_blob, dbms_lob.lobmaxsize, l_dest_offset, l_src_offset,
                         nls_charset_id('AL32UTF8'), l_lang_context, l_warning);
  return l_result;
end;
/

select repair_data(T) from example;

REPAIR_DATA(T)                          
----------------------------- 
Example β Example β Example β ...

update example set t = repair_data(t);
...