I am wondering: assuming everything is set up properly, meaning a column with charset utf8mb4
and a client connecting to the database setting the connection encoding to utf8mb4
, is there any way in which said client could SELECT
data which contains invalid UTF-8 byte sequences? Is there any way to insert data into a utf8mb4
column which is not in fact valid UTF-8?
Put another way: is it reasonable to assume that SELECT
ing any non-binary column over a client connection set to utf8mb4
should always return valid UTF-8 byte sequences?
(Let's assume plain SELECT *
statements without any CAST
shenanigans or such.)
Best Answer
I don't think so. But I have come close:
By starting with
BINARY
, then converting toCHAR
, I can get invalid data into the column. However, theALTER
gives a warning when I try to convert it.Note in the final
SELECT
, the data is truncated. The truncation happened in theALTER
, not theSELECT
, where your Question wanted it to happen. This can be verified by converting back toBINARY
:If you have seen the problem happen, please provide details; perhaps I can dissect the problem.