Mysql – “Data truncated” vs “Incorrect string value”

character-setMySQL

I'm trying to figure out the difference between some warning messages related to data truncation. Consider the following table:

CREATE TABLE `txttest` (   
     `mycol` text NOT NULL ) 
ENGINE=InnoDB DEFAULT CHARSET=utf8;

Pretty much expected behavior:

mysql > insert into txttest (mycol) values (repeat('a',65535));
Query OK, 1 row affected (0.17 sec)

mysql > insert into txttest (mycol) values (repeat('a',65536));
Query OK, 1 row affected, 1 warning (0.16 sec)

mysql > show warnings;
+---------+------+--------------------------------------------+
| Level   | Code | Message                                    |
+---------+------+--------------------------------------------+
| Warning | 1265 | Data truncated for column 'mycol' at row 1 |
+---------+------+--------------------------------------------+
1 row in set (0.00 sec)

When I'm using multi byte characters however, the expected warning state occurs but the message is different:

mysql > insert into txttest (mycol) values (repeat('é',65536/2-1));
Query OK, 1 row affected (0.17 sec)

mysql > insert into txttest (mycol) values (repeat('é',65536/2));
Query OK, 1 row affected, 1 warning (0.16 sec)

mysql > show warnings;
+---------+------+----------------------------------------------------------------+
| Level   | Code | Message                                                        |
+---------+------+----------------------------------------------------------------+
| Warning | 1366 | Incorrect string value: '\xC3\xA9' for column 'mycol' at row 1 |
+---------+------+----------------------------------------------------------------+
1 row in set (0.00 sec)

Google attempts aren't very useful as I'm mostly running across pastes of this message for different contexts.

Basically what I'm wanting to know is: Is this just simply the wording/error code Mysql uses when truncating a sequence of multibyte characters, or is there something more telling I should be gleening from this message?

I was initially thinking it meant the byte sequence was getting split in such away it was resulting in malformed characters. Attempts to try and make it do this didn't work (e.g. mysql seemed good about recognizing proper byte boundaries for a character encoding).

Edit:

After relooking over it does look to be the char splitting I initially dismissed. I had an off by one brain fart looking at it initially.

Best Answer

It is an attempt to store a 2-byte string into one byte

Since the maximum length of TEXT is 65535, it can safely hold 32767 (65536/2 - 1) 2-byte characters without an error message.

Any attempt to add 32768 2-byte characters will result in Incorrect string value: '\xC3\xA9' for column 'mycol' at row 1 because the 32768th character does not have the room to be inserted into mycol.

In reality, you do not get a malformed character. You really can a TEXT field whose 2-byte character count and length is 32767. The last character is simply lost and not considered .

To make sure, run this

select length(mycol) from txttest;

The last INSERT will not be length 65536 by 65534.