Mysql – Using CONVERT TO CHARACTER SET for changing only collation

character-setcollationMySQLmysql-5.7percona

I think the answer to this is an obvious yes, but I'm having trouble finding confirmation on it. All the questions related to charsets seems to be about changing the charset and not only collations.

So I have tables in utf8mb4_general_ci that I want to convert to utf8mb4_unicode_ci. Can I safely run the following query to get the task done?

ALTER TABLE mytable CONVERT TO CHARACTER SET utf8mb4 COLLATE utf8mb4_unicode_ci

I would run that on all tables that have a table collation in information_schema set to utf8mb4_general_ci. Anything I need to be aware of before beginning this task? Should I double check that the columns in the table also have the same charset set? I've received a warning that even just changing the collation can cause charset issues to pop up afterwards, but not sure how much truth there is to that…

I also have latin1-tables and utf8-tables in these databases to convert. But I think I'll just try to tackle one challenge at a time.. Since I don't want to have the column types changed, I guess I'll have to do it in a more complicated way than using that command.

(I'm trying to fix charset inconsistencies between tables in numerous databases we have)

So I would like to start by unifying the collations on those tables already utf8mb4 and then just continue gradually from there.

Server version: 5.7.16-10-log Percona Server (GPL), Release '10', Revision 'a0c7d0d'

Best Answer

There are a few options for managing character set conversions. As you have found, there is the alter table option, and you can use the character set clause or the collate clause or both in your statements.

Other options include changing the character set and collation for the whole database (which I can see you don't want to do just yet).

Or there's a tool in the free and open source Percona Toolkit that is very popular with developers, pt-online-schema-change, that helps you manage such migrations for tables with a primary key.

David Berube, a consultant independent from Percona, wrote a detailed blog post on our community blog about some of the problems that you can encounter when changing character sets and collations. You can read that at https://www.percona.com/community-blog/2018/06/12/character-sets-migrating-utf8mb4-pt_online_schema_change/

The potential pitfalls included database version compatibility, application "expectations", key length management (the length could change), and the fact that you could see false positives i.e. a column that says it's latin1 may not be!

This time on the Percona website, there are a number of blog posts and also a free webinar Troubleshooting Issues with MySQL Character Sets https://www.percona.com/resources/webinars/troubleshooting-issues-mysql-character-sets

Hope some of these will help you work out the best way to approach your scenarios.

-- Disclosure: I work for Percona.