You have the option of setting character sets/collations on the fly:
While trial-and-error may be necessary, don't go willy-nilly on these variables in /etc/my.cnf. You are better off setting them dynamically during any trial-and-error testing.
To make sure of any corner cases, look at the initial variables of any mysqldump and see if character sets or collations are set in the beginning and reset at the bottom.
In fact, here is a sample for a mysqldump:
/*!40101 SET @OLD_CHARACTER_SET_CLIENT=@@CHARACTER_SET_CLIENT */;
/*!40101 SET @OLD_CHARACTER_SET_RESULTS=@@CHARACTER_SET_RESULTS */;
/*!40101 SET @OLD_COLLATION_CONNECTION=@@COLLATION_CONNECTION */;
/*!40101 SET NAMES utf8 */;
/*!40103 SET @OLD_TIME_ZONE=@@TIME_ZONE */;
/*!40103 SET TIME_ZONE='+00:00' */;
/*!40014 SET @OLD_UNIQUE_CHECKS=@@UNIQUE_CHECKS, UNIQUE_CHECKS=0 */;
/*!40014 SET @OLD_FOREIGN_KEY_CHECKS=@@FOREIGN_KEY_CHECKS, FOREIGN_KEY_CHECKS=0 */;
/*!40101 SET @OLD_SQL_MODE=@@SQL_MODE, SQL_MODE='NO_AUTO_VALUE_ON_ZERO' */;
/*!40111 SET @OLD_SQL_NOTES=@@SQL_NOTES, SQL_NOTES=0 */;
See the variables
- @OLD_CHARACTER_SET_CLIENT
- @OLD_CHARACTER_SET_RESULTS
- @OLD_COLLATION_CONNECTION
- SET NAMES is hardwired to utf8
You could perhaps set these options
- when calling the mysqldump
- place these options in /etc/my.cnf under
[mysqldump]
group section
- edit these values for existing mysqldumps using perl, awk, etc.
Think about it:
- You are storing data in the database as
latin1
- You are data is handled internally by mysqld as
latin1
If data coming from the OS or from the connection is utf8
, how is mysqld going to treat it?
Rather than guessing or hoping for the best, you could change the incoming character set behavior. With the exception of information_schema
and mysql
, take all your databases and set the default character set to utf8
:
ALTER DATABASE dbname CHARACTER SET utf8;
If you have a specific colllation to go with it, do this:
ALTER DATABASE dbname COLLATE 'utf8_general_ci';
Here are the collations to choose from:
mysql> select * from information_schema.collations where CHARACTER_SET_NAME = 'utf8';
+--------------------+--------------------+-----+------------+-------------+---------+
| COLLATION_NAME | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+--------------------+--------------------+-----+------------+-------------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
| utf8_romanian_ci | utf8 | 195 | | Yes | 8 |
| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 |
| utf8_polish_ci | utf8 | 197 | | Yes | 8 |
| utf8_estonian_ci | utf8 | 198 | | Yes | 8 |
| utf8_spanish_ci | utf8 | 199 | | Yes | 8 |
| utf8_swedish_ci | utf8 | 200 | | Yes | 8 |
| utf8_turkish_ci | utf8 | 201 | | Yes | 8 |
| utf8_czech_ci | utf8 | 202 | | Yes | 8 |
| utf8_danish_ci | utf8 | 203 | | Yes | 8 |
| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 |
| utf8_slovak_ci | utf8 | 205 | | Yes | 8 |
| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 |
| utf8_roman_ci | utf8 | 207 | | Yes | 8 |
| utf8_persian_ci | utf8 | 208 | | Yes | 8 |
| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 |
| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 |
| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 |
+--------------------+--------------------+-----+------------+-------------+---------+
22 rows in set (0.03 sec)
You could also run
mysql> show collation where charset='utf8';
+--------------------+---------+-----+---------+----------+---------+
| Collation | Charset | Id | Default | Compiled | Sortlen |
+--------------------+---------+-----+---------+----------+---------+
| utf8_general_ci | utf8 | 33 | Yes | Yes | 1 |
| utf8_bin | utf8 | 83 | | Yes | 1 |
| utf8_unicode_ci | utf8 | 192 | | Yes | 8 |
| utf8_icelandic_ci | utf8 | 193 | | Yes | 8 |
| utf8_latvian_ci | utf8 | 194 | | Yes | 8 |
| utf8_romanian_ci | utf8 | 195 | | Yes | 8 |
| utf8_slovenian_ci | utf8 | 196 | | Yes | 8 |
| utf8_polish_ci | utf8 | 197 | | Yes | 8 |
| utf8_estonian_ci | utf8 | 198 | | Yes | 8 |
| utf8_spanish_ci | utf8 | 199 | | Yes | 8 |
| utf8_swedish_ci | utf8 | 200 | | Yes | 8 |
| utf8_turkish_ci | utf8 | 201 | | Yes | 8 |
| utf8_czech_ci | utf8 | 202 | | Yes | 8 |
| utf8_danish_ci | utf8 | 203 | | Yes | 8 |
| utf8_lithuanian_ci | utf8 | 204 | | Yes | 8 |
| utf8_slovak_ci | utf8 | 205 | | Yes | 8 |
| utf8_spanish2_ci | utf8 | 206 | | Yes | 8 |
| utf8_roman_ci | utf8 | 207 | | Yes | 8 |
| utf8_persian_ci | utf8 | 208 | | Yes | 8 |
| utf8_esperanto_ci | utf8 | 209 | | Yes | 8 |
| utf8_hungarian_ci | utf8 | 210 | | Yes | 8 |
| utf8_sinhala_ci | utf8 | 211 | | Yes | 8 |
+--------------------+---------+-----+---------+----------+---------+
22 rows in set (0.00 sec)
mysql>
To see the individual charset of a database run this:
mysql> show create database sample;
+----------+-------------------------------------------------------------------+
| Database | Create Database |
+----------+-------------------------------------------------------------------+
| sample | CREATE DATABASE `sample` /*!40100 DEFAULT CHARACTER SET latin1 */ |
+----------+-------------------------------------------------------------------+
1 row in set (0.00 sec)
mysql>
As for the settings, you could try this:
Add the lines to my.cnf
[mysqld]
character_set_database=utf8
character_set_server=utf8
then restart mysql
I discussed this back on Aug 01, 2011 : Character Set Encoding in a Table
CAVEAT (For MySQL DB Servers in Windows)
These commands
ALTER DATABASE dbname CHARACTER SET utf8;
ALTER DATABASE dbname COLLATE 'utf8_general_ci';
do not work in the Windows Version of MySQL because of the way Windows locks files. The file needed is called db.opt
which is located in the database subfolder in datadir
.
You may have to do the following:
- mysqldump that database (no database create info, just table creation and INSERTs)
- drop that database
- create database with the specific charset and collation
- reload in the dump into it
EPILOGUE
No matter what you do, please perform any changes on a Dev/Staging Server to see if you get the desired effects
UPDATE 2012-12-05 11:00 EDT
Your Questions
Should I really change it?
To guarantee the proper treatment of the data, you may want to make sure you have apples-to-apples. Data prepared as one charset and loading it into a table with the database possibly aligning the data as if it sees another charset would probably not display the data with the charset mysqld sees when retrieved and sent back to a DB Connection. Try loading the database on a Dev/Staging Server and experiment with setting default charsets.
Why some defaults use utf8
but some default use latin1
?
This would depend on the OS version of the MySQL Binary. Windows versions may have latin1
while Linux Versions may use utf8
.
Best Answer
character_set_server
is not relevant. The setting on the database is just a default for when youCREATE TABLE
without specifying aDEFAULT CHARACTER SET
.You need
mysqli
, usemysqli_set_charset('utf8')
.CHARACTER SET utf8
. Look atSHOW CREATE TABLE
; if it is not explicit on the column then look at the default for the table.<meta charset=UTF-8>
.