MySQL converting column of type TEXT to TINYTEXT truncates value at non-intuitive position, why

MySQLmysql-5.6

I recently ran into a problem where a table had a column of type TEXT inadvertently converted to type TINYTEXT (MySQL version 5.6.24). We were able to restore any truncated data, but the confusing thing was was that we only noticed the problem when certain values for the column started appearing to be truncated at seemingly random positions in the text string.

After trying to wrap my head around what happened I finally realized that each value originally >= 256 chars was truncated to a length equal to the difference between its original length and the nearest lower multiple of 256. Eg:

Original Length Truncated Length Difference
257             1                256
350             94               256
597             85               512

I then confirmed this is exactly what happens when you convert a TEXT column to TINYTEXT. It's as if it iteratively removes chunks of 256 bytes until the total length is < 256. But why wouldn't it simply just truncate the value to 255 bytes to begin with? Seems like a more efficient operation, but I have no idea about how the internals work.

I couldn't find an explanation on the MySQL documentation. I mean, either way if you have values that exceed the max length you're going to lose data; obviously one wouldn't do this on purpose, but figuring out what happened would have been a bit less confusing if all of the truncated values had length 255.

EDIT: I confirmed this behavior for both the MyISAM and InnoDB storage engines.

EDIT AGAIN: Code to reproduce scenario:

create temporary table tmp (
    text_col text
)
collate = 'utf8_general_ci'
engine = InnoDB;

create temporary table results (
    current_value text,
    length_text_col int(11),
    char_length_text_col int(11),
    text_col_type char(8)
)
collate = 'utf8_general_ci'
engine = InnoDB;

insert into tmp
    (text_col)
values
    (repeat('a', 257)), -- 1 byte/character
    (repeat('ア', 86)); -- 3 bytes/character

insert into results
select
    text_col as current_value,
    length(text_col) as length_text_col,
    char_length(text_col) as char_length_text_col,
    'TEXT' as text_col_type
from
    tmp;

alter table tmp
change column text_col text_col tinytext;

insert into results
select
    text_col as current_value,
    length(text_col) as length_text_col,
    char_length(text_col) as char_length_text_col,
    'TINYTEXT' as text_col_type
from
    tmp;

select * from results;

The above code will select:

current_value  length_text_col  char_length_text_col text_col_type
aaaaaaaaaa...  257              257                  TEXT
アアアアア....  258              86                   TEXT
a              1                1                    TINYTEXT
               2                2                    TINYTEXT

Best Answer

There is essentially no good reason to ever use TINYTEXT. And there are some obscure drawback.

If you are using utf8 or utf8mb4, then you may have "multi-byte" characters. TINYTEXT is limited to 255 bytes. VARCHAR(255), which is similar in many ways, is limited to 255 characters.

If your text is any any Asian language, then only about 85 characters can be stored in TINYTEXT. For VARCHAR(255), a full 255 characters can be stored.

More

I agree that the ALTER is messing thing up in a strange way. Here's a shorter example:

DROP TEMPORARY TABLE tmp;
create temporary table tmp (
    text_col text
) collate = 'utf8_general_ci' engine = InnoDB;

insert into tmp
    (text_col)
values
    (repeat('a', 257)), -- 1 byte/character
    (repeat('ア', 86)); -- 3 bytes/character

SHOW CREATE TABLE tmp\G
SELECT text_col, length(text_col) FROM tmp;
alter table tmp
    change column text_col text_col tinytext;
SELECT text_col, length(text_col), HEX(text_col) FROM tmp;  -- Strangly truncated
SHOW CREATE TABLE tmp\G

The HEX comes back with the first two bytes of Katakana 'A', which is E382A2, so I will guess that the ALTER did some kind of modulo -- picking (258 % 256 == 2) bytes to keep.

Fails for me on 5.5.43, 5.6.22 and 8.0.2. On the other hand, MariaDB 10.0.28 and 10.3.7 complain about the ALTER:

mysql> alter table tmp
    ->     change column text_col text_col tinytext;
ERROR 1406 (22001): Data too long for column 'text_col' at row 1

File a bug at bugs.mysql.com.