NodeJS – Handling utf8_general_ci Issues

jsonMySQLquery

I have a website with a chat system. This chat system has a table 'messages' and one of the columns is 'messagetext' set to utf8_general_ci(MySQL). In my website, I use php to query database. The results are ok.

In my app, i use NodeJS as backend but the results are like that: " ασδασδ ". There are greek words stored in this column.

I have tried iconv-lite,iconv,utf8,decode Json.

My thought, in the website it's getting the value 'ασδασÎ' from column and decoding to utf8, that's how can I see the right results. BUT in app this is not happening, the results are the value instead of the actual word.

What can I do?

(from Comment)

CREATE TABLE messages (
    id int(11) NOT NULL AUTO_INCREMENT, 
    sender int(11) NOT NULL, 
    reciever int(11) NOT NULL, 
    seen varchar(3) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT 'no', 
    datetime datetime NOT NULL, 
    messagetext text NOT NULL, 
    userseen varchar(3) CHARACTER SET utf8 COLLATE utf8_unicode_ci NOT NULL DEFAULT 'no', 
    PRIMARY KEY (id)
) ENGINE=InnoDB AUTO_INCREMENT=94 DEFAULT CHARSET=utf8

Best Answer

node.js needs something like this to start with:

var connection = mysql.createConnection({ ... , charset : 'utf8'});

Yeah, ασδασδ looks Greek to me. What you have is "Mojibake". See https://stackoverflow.com/questions/38363566/trouble-with-utf8-characters-what-i-see-is-not-what-i-stored for discussion of what causes Mojibake.

Do not use any form of encode/decode/conv -- it only compounds the problem. If that link does not help you, provide the extra details that it requests.

See also https://www.npmjs.com/package/utf8

The form should mention utf-8:

<form method="post" action="/your/url/" accept-charset="UTF-8">

Analysis...

'Κ' (Greek Kappa) is the 2 bytes, hex 'CE9A', in utf8. 'Κ' in latin1 is hex 'CE9A' (2 bytes) 'Κ', if converted to utf8 is hex 'C38E C5A1' 'ÃŽÅ¡' is that that looks like when interpreted as latin1.

Note: this last hex is what you show in your Comment.

So, you seem to have "double encoding".

SELECT CONVERT(BINARY(CONVERT(CONVERT(BINARY(CONVERT('ÃŽÅ¡' USING latin1)) USING utf8mb4) USING latin1)) USING utf8mb4);  --> 'Κ'

What is missing is where "latin1" is occurring in the flow of the data.