If I only use only ASCII characters, will VARCHAR (255)
with utf8mb4_0900_ai_ci
be larger on disk than VARCHAR (255)
using ASCII?
Mysql – Does MySQL 8 ASCII vs utf8mb4_0900_ai_ci size differ when only using ASCII characters
character-setcollationencodingMySQLmysql-8.0
Related Question
- Mysql – How large space will be occupied by thesql for a varchar utf8 column
- Mysql – WordPress using varchar(255) for index with InnoDB and utf8mb4_unicode_ci
- Mysql – Two tables with two different charsets
- Sql-server – Encoding issue with SQL Server VARCHAR column retrieved in Python
- Sql-server – Why does the varchar datatype allow unicode values
- Mysql – Performance when using truncated VARCHAR as index in MySQL
- Sql-server – Query to find rows containing ASCII characters in a given range
- MySQL illegal mix of collations, ASCII to UTF-8
Best Answer
The Fiddle is wrong.
is 20 characters / 40 bytes when declaring that the client is encoded in utf8 (or utf8mb4). But if you claim that that it is in latin1, it leads to Mojibake or "double-encoding", hence the 30 and 48 that Fiddle shows.
For further discussion of what went wrong, see "double encoding" in https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored . I don't have the source code to "fix" Fiddle.
That is,
E38182
is the 3 hex bytes for the HIRAGANA LETTER A:あ
But, if you treat E38182 (etc) as latin1, it shows as
゠A ㄠI ㆠU ㈠E ㊠O.
. Then if you convert again to utf8, you getYou can still recognize the spaces (20), A (41), I (49), etc, but the Hiragana characters have been mangled.
You don't see the double-encoding in Fiddle because the browser is 'kind enough' to 'fix' your mistake. (This makes figuring out what went wrong quite devilish.)
The Chinese hex is E683B3 E79C8B E4BB80 E9A0AD E6B885 E58FAA E582B7 E7B2BE EFBC8C E4B8AD E7BE8E E8A780 E79A84 E68EA5 E5A794 E4B8BB E58091 E8AA8D E58FAF E69893 E795AB E7AD89 E58AA9 E6B5B7 E59BA0 09
(The tab (09) at the end may be an artifict of the formatting.)
The double encoding starts with C3A6 C692 C2B3 (from EF, BC, 8C) C3A7 C593 E280B9 C3A4 C2BB E282AC C3A9 C2A0 C2AD C3A6 C2B8 E280A6
Back to the Title Question -- There are minor subtle differences, even when all you use is ascii.
You will probably not encounter any measurable difference. Here are som possibilities.
Certain temp table actions may hit limits sooner. (This problem existed in 5.7, but may have been more than eliminated in 8.0 by now turning
VARCHAR
intoCHAR
when building temp tables.)I have yet to see a benchmark that shows that utf8mb4 collations of ascii text are or are not as fast as
CHARACTER SET
latin1 or ascii.Index limits are shorter for CHARSET utf8mb4 than for CHARSET ascii. They are probably
VARCHAR(3072)
versusVARCHAR(768)
.