Mysql – Incorrect string value: ‘\xC5\xABt\xC4\x97’

collationerrorsMySQLschema

I am facing database error Incorrect string value: '\xC5\xABt\xC4\x97'. I did a lot search and tries a few ways, include move from utf8 to utf8mb4. But still not able to solve it.

Following are the database chartset, collation, and table schema.

+--------------------------+--------------------+
| Variable_name            | Value              |
+--------------------------+--------------------+
| character_set_client     | utf8mb4            |
| character_set_connection | utf8mb4            |
| character_set_database   | utf8mb4            |
| character_set_filesystem | binary             |
| character_set_results    | utf8mb4            |
| character_set_server     | utf8mb4            |
| character_set_system     | utf8               |
| collation_connection     | utf8mb4_unicode_ci |
| collation_database       | utf8mb4_unicode_ci |
| collation_server         | utf8mb4_unicode_ci |
+--------------------------+--------------------+

| DELIVEREDEMAIL | CREATE TABLE `DELIVEREDEMAIL` (
  `ID` int(11) NOT NULL AUTO_INCREMENT,
  `RelatedModule` varchar(20) COLLATE utf8mb4_unicode_ci NOT NULL,
  `UserUUID` varchar(40) COLLATE utf8mb4_unicode_ci NOT NULL,
  `Username` varchar(1000) COLLATE utf8mb4_unicode_ci NOT NULL,
  `Email` varchar(255) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `EmailType` enum('NEW_PRECITION_EMAIL','ACTIVITY_EMAIL','LOG_IN_BASED_EMAIL','REPLAY_TO_PREDICT_EMAIL','NEW_TVSHOW_EMAIL','NEW_REWARD_EMAIL','TUNE_IN_EMAIL','EXPIRED_PREDICTION_EMAIL','NEW_FOLLOWER_EMAIL','UCP_UCT_CONFIRMATION_EMAIL') COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `EmailSendingFrequency` enum('DAILY','WEEKLY','BIWEEKLY','MONTHLY','ASAP','UNSUBSCRIBE','SUBSCRIBE') COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `SendTime` timestamp NOT NULL DEFAULT '0000-00-00 00:00:00',
  `DataInEmail` varchar(5000) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `Code` int(11) NOT NULL,
  `Message` varchar(500) COLLATE utf8mb4_unicode_ci NOT NULL,
  PRIMARY KEY (`ID`),
  KEY `notification_senttime` (`SendTime`),
  KEY `notification_emailtype` (`EmailType`),
  KEY `notification_useruuid` (`UserUUID`)
) ENGINE=InnoDB AUTO_INCREMENT=3807922 DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 |

The error I got is when try to store username such as "Тхьабысым", I got "Incorrect string value: '\xC5\xABt\xC4\x97' for column 'p_Username' at row 4".

Anyone have some idea? Thanks in advance!


EDIT:
I tried to change charset to latin_1 and collation to latin1_swedish_ci, and it worked. Does it means "Тхьабысым" are latin characters but 4 byte UTF-8 as utf8mb4 still not able to represent these characters? What should I do with these kind of characters? Since this is username pulled from Facebook, there can be all kinds of characters.

Best Answer

Since nobody answer the question and I finally figured it out. I'll answer my own question and hope it may help someone later.

Data is being inserted by stored procedure, alter dababase, table charset and collations will not override charset and collation setting in stored procedure, nor the changes in my.cnf.

After all the database chartsets, collations are correct, drop and reinsert the stored procedure. Before drop, the procedure was like this:

| MAILSERVER | sp_getDataForNewReward         | PROCEDURE | root@localhost | 2014-05-19 23:08:38 | 2014-05-19 23:08:38 | DEFINER       |         | utf8                 | utf8_general_ci      | latin1_swedish_ci  |

After drop and reinsert, it become this:

| MAILSERVER | sp_insertDeliveredEmail        | PROCEDURE | root@localhost | 2015-01-15 23:38:24 | 2015-01-15 23:38:24 | DEFINER       |         | utf8mb4              | utf8mb4_unicode_ci   | utf8mb4_unicode_ci |

Then the procedure will work properly. When stored procedure being created, it will get the charset and collation from the database setting, so after database setting, it need a "refresh".