Character Set Encoding in MySQL Tables

character-setMySQL

I have the following data:Trollhättan

  • If I print out and setting up the header to utf-8, then this is the output in the browser.

  • If I'm not setting up the utf-8 in the header, Trollhättan. However,when I'm storing the data in the database and checking through phpmyadmin, I am getting the following string : Trollhättan.

  • When I am setting up the header to utf-8, I'm getting this string : Trollhättan beside this Trollhättan.

Since the table is latin1_swedish_ci, do I have to use in the table UTF-8?

Best Answer

Here is the list of collations under UTF-8

mysql> select * from information_schema.collations where CHARACTER_SET_NAME = 'utf8';
+--------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME     | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci    | utf8               |  33 | Yes        | Yes         |       1 |
| utf8_bin           | utf8               |  83 |            | Yes         |       1 |
| utf8_unicode_ci    | utf8               | 192 |            | Yes         |       8 |
| utf8_icelandic_ci  | utf8               | 193 |            | Yes         |       8 |
| utf8_latvian_ci    | utf8               | 194 |            | Yes         |       8 |
| utf8_romanian_ci   | utf8               | 195 |            | Yes         |       8 |
| utf8_slovenian_ci  | utf8               | 196 |            | Yes         |       8 |
| utf8_polish_ci     | utf8               | 197 |            | Yes         |       8 |
| utf8_estonian_ci   | utf8               | 198 |            | Yes         |       8 |
| utf8_spanish_ci    | utf8               | 199 |            | Yes         |       8 |
| utf8_swedish_ci    | utf8               | 200 |            | Yes         |       8 |
| utf8_turkish_ci    | utf8               | 201 |            | Yes         |       8 |
| utf8_czech_ci      | utf8               | 202 |            | Yes         |       8 |
| utf8_danish_ci     | utf8               | 203 |            | Yes         |       8 |
| utf8_lithuanian_ci | utf8               | 204 |            | Yes         |       8 |
| utf8_slovak_ci     | utf8               | 205 |            | Yes         |       8 |
| utf8_spanish2_ci   | utf8               | 206 |            | Yes         |       8 |
| utf8_roman_ci      | utf8               | 207 |            | Yes         |       8 |
| utf8_persian_ci    | utf8               | 208 |            | Yes         |       8 |
| utf8_esperanto_ci  | utf8               | 209 |            | Yes         |       8 |
| utf8_hungarian_ci  | utf8               | 210 |            | Yes         |       8 |
| utf8_sinhala_ci    | utf8               | 211 |            | Yes         |       8 |
+--------------------+--------------------+-----+------------+-------------+---------+
22 rows in set (0.03 sec)

latin1_swedish_ci belongs to latin1

mysql> select * from information_schema.collations where COLLATION_NAME = 'latin1_swedish_ci';
+-------------------+--------------------+----+------------+-------------+---------+
| COLLATION_NAME    | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+-------------------+--------------------+----+------------+-------------+---------+
| latin1_swedish_ci | latin1             |  8 | Yes        | Yes         |       1 |
+-------------------+--------------------+----+------------+-------------+---------+
1 row in set (0.03 sec)

The closest utf8 for your collation is ID 200

mysql> select * from information_schema.collations where ID = 200;
+-----------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME  | CHARACTER_SET_NAME | ID  | IS_DEFAULT | IS_COMPILED | SORTLEN |
+-----------------+--------------------+-----+------------+-------------+---------+
| utf8_swedish_ci | utf8               | 200 |            | Yes         |       8 |
+-----------------+--------------------+-----+------------+-------------+---------+
1 row in set (0.00 sec)

CAVEAT

You may have to experiment with latin1, change the character set and collation of the individual column in the table, or possibly both. At the very least, use latin1 for display in phpmyadmin.

You could tweek and experiment with making the entire database a specific character set and collation using ALTER DATABASE.

ALTER DATABASE dbname CHARACTER SET utf8 COLLATE utf8_general_ci;

Make sure you make a backup of the database and load it into a dev/staging DB and then run ALTER DATABASE against dev/staging.