Tip: To save space with UTF-8, use VARCHAR instead of CHAR. Otherwise, MySQL must reserve three bytes for each character in a CHAR CHARACTER SET utf8 column because that is the maximum possible length. For example, MySQL must reserve 30 bytes for a CHAR(10) CHARACTER SET utf8 column.
From:http://dev.mysql.com/doc/refman/5.7/en/charset-unicode-utf8.html
Does this mean if I use a varchar utf8 column other than char utf8 column in mysql, mysql will physically store characters using 1 byte for latin and 3 bytes for CJK(Chinese、japanese、Korean) ?
Best Answer
CHAR(...) CHARACTER SET utf8
always takes 3 bytes per character --CHAR(100)
occupies 300 bytes (no length needed).VARCHAR
occupies 1-2 bytes for a length, plus only as many bytes as needed. SoVARCHAR(100)
withhello
will occupy 7 (2+5) bytes in any character set.Señor
, inCHARACTER SET latin1
, take 5 bytes (plus length). In utf8, it takes 6 bytes (plus length). This is becauseñ
is the 1-byte hexF1
in latin1 or the 2-byteC3B1
for utf8.Some Chinese characters and some Emoji, need 4 bytes, so
utf8mb4
is a better choice for them.There is very little use for ever using
CHAR
instead ofVARCHAR
. They are usually for consistent-length fields such as postal code, country code, md5, uuid, etc. And all of those may as well beCHARACTER SET ascii
or possiblylatin1
, but notutf8
.Back to your question... With
CHARACTER SET utf8
(or utf8mb4):"mysql will physically store characters using 1 byte for latin and 3 bytes for CJK" is not phrased correctly. Change 'latin' to 'English' to make it correct. ('Latin' tends to include various accented characters.)