Looking at the MySQL server variables the server and database collation are set to latin1_swedish_ci but the collaction_connection is utf8_general_ci. Additionally, the collation/charset is set at many levels: server, database, table and column. You also have the charset of the PHP MySQL connection to consider.
My question is fourfold:
-
Why are there different levels of MySQL collation/charsets? Is it so you can mix up your character sets to suit your needs? Correct me if I'm wrong but utf8 seems to be the best character set for general use so why is it that most of the LAMP setups I've used have latin1 as default?
-
Should you always ensure your PHP connection matches the charset of the database you're working on?
-
If you can have different tables that use different character sets do you just use SET NAMES or mysql(i)_set_charset to switch?
-
If you have a table that has multiple charsets how do you manage that since the connection can only use one charset at a time?
Many thanks.
EDIT:
Regarding the comment:
“You can use either, the former one only to set results charset and the latter one to set either PHP internal encoding for use with mysqli_real_escape_string and results encoding.”
I thought you aren't meant to use real_escape_string and SET NAMES together. See: http://www.php.net/manual/en/mysqlinfo.concepts.charset.php
Best Answer
QUESTION #1
ANSWER TO QUESTION #1
There are two good reasons for different character sets and collations
Reason #1 : Disk Space
When you run this query
You get this:
Some character sets have a Maximum Length of 1 byte to represent a character. Other need more. Give this information, you may want to refrain from using the eucjpms, ujis, utf8, utf16, utf32, utf8mb4 character sets so that VARCHAR and TEXT data takes less space on disk.
Reason #2 : Internationalization
Characters Sets Each Come With One or More Collations to cover a variety of Languages
When you run this query
You will see that some Characters Sets have with multiple collations for Different Parts of Europe. Chinese, Japanese, Greek, and parts of Asia Minor and Scandinavia are also available.
QUESTION #2
ANSWER TO QUESTION #2
SCENARIO
You are driving at 3:00 AM. You are the only driver on the road. You come to an intersection. You have the red light.
Question : Do you stop or go through the red light?
Answer : Depends on the neighborhood
How does this apply?
You should err on the side of caution. You should always check the charset beforehand because you do not know the neighborhood (client program, internet browser) the PHP connection will be entering and if there is a risk of a carjacking (putting invalid data into the database, requesting too much data for retrieval).
QUESTION #3
ANSWER TO QUESTION #3
By all means
QUESTION #4
ANSWER TO QUESTION #4
You may have to shift character sets with the DB Session. Here are the settings that can be changed at the session level:
Please set these carefully before reading from and writing to the database. It would also be wise to store the character set name and collation in the same table you will be accessing.