Sql-server – How to bulk convert ISO8859 text in Text columns to UTF8 in nvarchar(max) columns

character-setsql serversql-server-2008-r2type conversionutf-8

I've been struggling for days with a problem on a MS SQL database. Here is the situation: I have an old Open Source PHP application (Limesurvey version 1.92) running on a Windows 2008R2 server, with SQL2008R2 as RDBMS. I need to migrate the surveys in the application to a new instance, running on Linux with MySQL. The new instance is version 3, and database structure between both versions is different. The scenario I tried to use is upgrading the old version on the old server, and then exporting the surveys as export files, which I can import into the new application. It works well. Except that I'm stuck because of a problem of character encoding. The old application (web application) uses UTF-8. But the database uses only VARCHAR and TEXT columns. Therefore UTF-8 characters are stored using several bytes in these columns : é is stored as é for example. When I upgrade the application, the upgrade script changes the structure of the tables: TEXT columns become NVARCHAR(MAX), but the content in the column is not converted to UTF-8. So I'm searching for a way to change the content in the NVARCHAR(MAX) columns from ISO8859 to UTF-8. The database has approx 200 tables and is 600MB.

Best Answer

SQL Server 2008r2 dopes not support UTF8 (such support does not arrive until the upcoming 2019 release) it only supports UCS2 (essentially a subset of UTF16). Because of this there is unlikely to be a practical solution in SQL Server itself.

You may have to code up a tool to pull the data out, convert it, and update the DB with the newly converted values.

On rereading the question, I notice you mention mysql as the target database, though your tags do not mention that. The types you mention (NVARCHAR etc.) are not mysql types. You may need to clarify the question and update the tags.