Mysql – Does a huge key length value for a mulibyte column affect the index performance

encodingexplainindexMySQLperformance

When I look at the EXPLAIN results, the key len value is always calculated based on the actual column length multiplied on the maximum number of bytes for the chosen encoding. Say, for a varchar(64) using utf8 encoding the key len is 192.

Does this number affect performance in any way and should I reduce it when possible? I mean, does it make MySQL to reserve some space somewhere that remain unused, or it's just a maximum possible value while the used space is based on the exact data length?

So the actual question is: if I have a column that contains only Latin letters and numbers, should I change its encoding to latin1 from utf8 in regard of the space occupied by the index/overall index performance?

Best Answer

The number is a worst-case estimate. Remember that EXPLAIN doesn't typically read the data, so it has no idea if your data contains only a short string of 1-byte characters on average, or if it contains a full 64 characters each 3-bytes per character.

The performance of the query is much more influenced during execution by the number of pages read, and that varies more directly with the actual data. Short string values will allow more index entries to fit per page, therefore fewer pages need to be read. This is something the EXPLAIN cannot predict.

I would recommend do NOT change your encoding to latin1 solely for the sake of making EXPLAIN output "less confusing." This is not a good reason, and it's bound to limit you later if you want to support UTF-8 characters in that column.

In today's era, there's virtually no reason to use the latin1 character set. I recommend setting all strings to utf8mb4 as a default.

If the column has something like strings of hex digits for a uuid or a hash value, that's different. You could set the character set to ascii. But that's for a reason of the data, not for the convenience of someone reading the EXPLAIN.