Sql-server – How to bulk convert ISO8859 text in Text columns to UTF8 in nvarchar(max) columns

character-setsql serversql-server-2008-r2type conversionutf-8

I've been struggling for days with a problem on a MS SQL database. Here is the situation: I have an old Open Source PHP application (Limesurvey version 1.92) running on a Windows 2008R2 server, with SQL2008R2 as RDBMS. I need to migrate the surveys in the application to a new instance, running on Linux with MySQL. The new instance is version 3, and database structure between both versions is different. The scenario I tried to use is upgrading the old version on the old server, and then exporting the surveys as export files, which I can import into the new application. It works well. Except that I'm stuck because of a problem of character encoding. The old application (web application) uses UTF-8. But the database uses only VARCHAR and TEXT columns. Therefore UTF-8 characters are stored using several bytes in these columns : é is stored as Ã© for example. When I upgrade the application, the upgrade script changes the structure of the tables: TEXT columns become NVARCHAR(MAX), but the content in the column is not converted to UTF-8. So I'm searching for a way to change the content in the NVARCHAR(MAX) columns from ISO8859 to UTF-8. The database has approx 200 tables and is 600MB.

Best Answer

SQL Server 2008r2 dopes not support UTF8 (such support does not arrive until the upcoming 2019 release) it only supports UCS2 (essentially a subset of UTF16). Because of this there is unlikely to be a practical solution in SQL Server itself.

You may have to code up a tool to pull the data out, convert it, and update the DB with the newly converted values.

On rereading the question, I notice you mention mysql as the target database, though your tags do not mention that. The types you mention (NVARCHAR etc.) are not mysql types. You may need to clarify the question and update the tags.

Related Solutions

Mysql – How to convert control characters in MySQL from latin1 to UTF-8

I'm not certain. I tried to start out be reproducing your problem but the alter worked fine for me.

test > CREATE TABLE `bar` (  `content` text ) ENGINE=MyISAM DEFAULT CHARSET=latin1;  INSERT INTO bar VALUES (0x8081828384858687898A8B8C8D8E8F909192939495969798999A9B9C9D9E9F);
Query OK, 0 rows affected (0.02 sec)

Query OK, 1 row affected (0.00 sec)

test > ALTER TABLE bar CHANGE content content TEXT CHARACTER SET UTF8;
Query OK, 1 row affected (0.04 sec)
Records: 1  Duplicates: 0  Warnings: 0

test > select * from bar;
+---------------------------------+
| content                         |
+---------------------------------+
| ����������������������������� |
+---------------------------------+
1 row in set (0.00 sec)

test > set names utf8;
Query OK, 0 rows affected (0.00 sec)

test > select * from bar;
+---------------------------------------------------------------------------------+
| content                                                                         |
+---------------------------------------------------------------------------------+
| €‚ƒ„…†‡‰Š‹ŒŽ‘’“”•–—˜™š›œžŸ |
+---------------------------------------------------------------------------------+
1 row in set (0.00 sec)

Here's my related char settings

test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

Edit

My char settings before running set names utf8

test > show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | latin1                     |
| character_set_connection | latin1                     |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | latin1                     |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (0.00 sec)

Version

test > select version();
+-------------------------+
| version()               |
+-------------------------+
| 5.1.41-3ubuntu12.10-log |
+-------------------------+
1 row in set (0.00 sec)

Sql-server – Convert all ntext columns to nvarchar(max)

Ok, I've re-written your SP for you with QUOTENAME and the schema like @MartinSmith suggested. Although I'm not quite sure why you are using an SP for this. It would seem like a stand alone piece of code without the SP wrapper would make more sense for something like this.

You use QUOTENAME to deal with odd characters in the name. For example Table-Test is a valid table name but won't work in your code unless you put []s around it [Table-Test]. QUOTENAME takes care of that for you. It also handles if you happen to have ']'s in your name as well. It generally considered a best practice when you are doing dynamic sql to include schemas and to use QUOTENAME where needed.

USE [EventLogic]
GO

/****** Object:  StoredProcedure [dbo].[usp_SL_ConvertNtextToNvarchar]    Script Date: 08/08/2013 16:28:58 ******/
SET ANSI_NULLS ON
GO

SET QUOTED_IDENTIFIER ON
GO

ALTER PROCEDURE [dbo].[usp_SL_ConvertNtextToNvarchar]
AS
/*
*/
BEGIN
  SET NOCOUNT ON;

  DECLARE @sql nvarchar(max)
  DECLARE @table_schema nvarchar(128)
  DECLARE @table_name nvarchar(128)
  DECLARE @column_name nvarchar(128)
  DECLARE @totalCount int
  DECLARE @count int

  SET @totalCount = 0;
  SET @count = 0;
  SET @sql = '';

  -- Eventlogic
  DECLARE tables_cursor CURSOR FOR 
    SELECT SCHEMA_NAME(so.schema_id) AS table_schema, so.name as table_name, 
            sc.name as column_name
      FROM sys.objects so
      JOIN sys.columns sc ON so.object_id = sc.object_id
      JOIN sys.types stp ON sc.user_type_id = stp.user_type_id
                        AND stp.name = 'ntext'
     WHERE so.type = 'U' -- to show only user tables
  OPEN tables_cursor

  FETCH NEXT FROM tables_cursor INTO @table_schema, @table_name, @column_name
  WHILE @@FETCH_STATUS = 0
  BEGIN
    SET @sql = 'ALTER TABLE Eventlogic.'+QUOTENAME(@table_schema)+'.' + 
                    QUOTENAME(@table_name) + ' ALTER COLUMN ' + 
                    QUOTENAME(@column_name) + ' nvarchar(max);'
    EXEC sp_executesql @sql
    --PRINT @sql

    SET @sql = 'UPDATE Eventlogic.'+QUOTENAME(@table_schema)+'.' + 
                    QUOTENAME(@table_name) + ' SET ' + 
                    QUOTENAME(@column_name) + '=' + 
                    QUOTENAME(@column_name) + ' '
    EXEC sp_executesql @sql
    --PRINT @sql

    SET @count = @count + 1;
    IF @count > 0
      PRINT ('Eventlogic.'+@table_schema+'.' + @table_name + '.' + @column_name + ' ' + CAST(@count AS nvarchar(10)))
    SET @totalCount = @totalCount  + @count;
    FETCH NEXT FROM tables_cursor INTO @table_schema, @table_name, @column_name
  END 
  CLOSE tables_cursor
  DEALLOCATE tables_cursor
  PRINT ('Total columns updated: ' + CAST(@totalCount AS nvarchar(10)))  

END;

GO

Best Answer

Related Solutions

Mysql – How to convert control characters in MySQL from latin1 to UTF-8

Sql-server – Convert all ntext columns to nvarchar(max)

Related Question