Sql-server – UTF8 Trouble while migrating from MSSQL to MySQL with MySQL Workbench

character-setmigrationMySQLmysql-workbenchsql server

I am tasked with migrating part of our database from an on-site MSSQL server to an Amazon RDS MySQL. This will be my first attempt with a database migration, so I am doing a dry run from our MSSQL server to a locally installed instance of MySQL using MySQL Workbench. Local install MySQL Workbench version 6.3. MySQL Server version 5.7

  • Error generated: Inserting Data: Incorrect string value: '\xA0\xA0 I-D...' for column 'Address1' at row 77 while performing bulk data transfer.
    • This happens twice in the same table (two different rows). All other tables are successful.

Symptoms:

  • On the Object Migration -> Manual Editing -> Column Mappings screen, I see a list of migration messages/warnings.
    • On columns that go from source VARCHAR/NVARCHAR to VARCHAR, I get "Collation SQL_Latin1_General_CP1_CI_AS migrated to utf8_general_ci. Same for Char to Char.
    • On columns that go from source BIT to TINYINT(1), I get "Source column type BIT was migrated to TINYINT(1)

Settings:

  • Source Selection is MSSQL Server, ODBC (Free TDS)
    • Advanced tab has "Driver sends Unicode data as UTF-8" checked. Required for Free TDS
  • Target Selection is local MySQL. Connection method is Standard(TCP/IP)

I have read THIS article which seems quite promising, but I am not sure how to implement step 3 as it is a migration and when I get to the "Create Schemas" step, it says it will drop the schema and recreate anew if it already exists.

  • All the indexes on this database are on small columns, so the 3-bit to 4-bit utf8 will not be an issue for indexes.

  • As far as "Modify connection, client, and server character sets" to be utf8mb4, I cannot even find the file to modify. I went to MySQL Server 5.7/my-default.ini and changed it to include

    [client]
    default-character-set = utf8mb4
    
    [mysql]
    default-character-set = utf8mb4
    
    [mysqld]
    character-set-client-handshake = FALSE
    character-set-server = utf8mb4
    collation-server = utf8mb4_unicode_ci
    

    as requested, but my variables still look like the below picture, so that is obviously not the right file. Plus, will I actually have access to this on the Amazon RDS instance?

enter image description here

Any thoughts?

Best Answer

Hex A0 probably came from a Word document, where it is a "hard space".

It appears that you did not specify that the source was 'latin1' and the destination is 'utf8'. Otherwise, A0 should have turned into C2A0. That seems correct. However, this seems incorrect: "Driver sends Unicode data as UTF-8". Or perhaps just inconsistent with "Collation SQL_Latin1_General_CP1_CI_AS migrated to utf8_general_ci". Did you do anything in ODBC to tell it to convert latin1 to utf8?

You have an inconsistency -- some places you mention utf8, some mention utf8mb4. Outside MySQL, there is only "utf8" (or "UTF-8"), but inside MySQL there is a difference between utf8 and utf8mb4. If you intend to handle Chinese or Emoji, go with utf8mb4.