Mysql – Mariadb (MySQL) On Windows- problem entering non-ASCII characters in a query

mariadbMySQLunicode

MariaDB: mysql  Ver 15.1 Distrib 10.3.8-MariaDB
OS: Windows 7 Pro 64 bit

I have my server database and client all set up (I believe) to support utf8 end to end-

(**my.cnf**)

#
# Set everything to utf8
#
# (per https://mariadb.com/kb/en/library/setting-character-sets-and-collations/)
#
[client]
default_character_set=utf8mb4

[mysql]
default_character_set=utf8mb4

[mysqld]
character_set_server = utf8mb4
collation_server = utf8mb4_unicode_ci
init_connect='SET NAMES utf8mb4'


(**Console**)
    chcp
    Active code page: 65001

(**Mysql client**)
show variables like '%char%';
+--------------------------+-----------------------------------------------+
| Variable_name            | Value                                         |
+--------------------------+-----------------------------------------------+
| character_set_client     | utf8mb4                                       |
| character_set_connection | utf8mb4                                       |
| character_set_database   | utf8                                          |
| character_set_filesystem | binary                                        |
| character_set_results    | utf8mb4                                       |
| character_set_server     | utf8                                          |
| character_set_system     | utf8                                          |
| character_sets_dir       | C:\Program Files\MariaDB 10.3\share\charsets\ |
+--------------------------+-----------------------------------------------+

I can store and display unicode characters such as the micron (µ) Omega (Ω) and em dash (—) in the database by importing them from csv files.

However, as soon as I started to try out some queries using unicode characters I hit a problem. Basically, entering a "more than 7 bit" unicode character in a quoted string seems to hide the closing quote mark from the mysql client, which then stays stuck in quote mode. Simple example below-

MariaDB [uom]> -- Using plain ASCII- this works
MariaDB [uom]> select replace('cm', 'c', 'm');
+-------------------------+
| replace('cm', 'c', 'm') |
+-------------------------+
| mm                      |
+-------------------------+
1 row in set (0.000 sec)

MariaDB [uom]> -- Try using a unicode character in the query, this fails
MariaDB [uom]> select replace('µm', 'µ', 'u');
    '>
    '>
    '> '
    -> ;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MariaDB server version for the right syntax to use near '' at line 1

Interestingly, if I run the same query non-interactively it executes, but the output seems to indicate something in the mysql command line path is not using unicode as the unicode characters are converted to "?" on output-

mysql -e "select replace('µm', 'µ', 'u');"
+-------------------------+
| replace('?m', '?', 'u') |
+-------------------------+
| um                      |
+-------------------------+

I'm clearly missing a piece of the puzzle here. Any ideas much appreciated.

EDIT

I'm working on a non-production db, so I was able (I thought) to set everything top to bottom to utf8mb4 and re-import my data. Still seeing oddities-

  1. The system is ignoring the setting for character_set_server in my.cnf. Even though I have the line-

    character_set_server = utf8mb4

in my.cnf, the system still reports it is set to "utf8" after restarting the server. If I issue the commmand set character_set_server='utf8mb4'; interactively in the client, the variable changes value for the session, but this has no affect on the other issues.

  1. When I import a "latin1" file (it's actually cp1252, but Mariadb insists on calling this latin1) containing characters above the 7 bit level, i'm seeing double-encoding. E.g:

    select unit, hex( left(unit, 2)) from codelist where id=3;
    +———+———————+
    | unit | hex( left(unit, 2)) |
    +———+———————+
    | µmol/L | C2B56D |
    +———+———————+
    1 row in set (0.001 sec)

The "micro" character, which should be hex B5 is stored as xC2B5 which is a chinese / Hangul character. With the server, client, and connection all set to utf8mb4, and the csv file i'm importing identified as latin1, I can;t see why double-encoding is happening.

The oddities with the myql client program (emitting "?" characters from the command below, even though it's set to run in utf8mb4-

mysql -e "select replace('µm', 'µ', 'u');"
+-------------------------+
| replace('?m', '?', 'u') |
+-------------------------+
| um                      |
+-------------------------+

mysql -e "show variables like '%char%';"
+--------------------------+-----------------------------------------------+
| Variable_name            | Value                                         |
+--------------------------+-----------------------------------------------+
| character_set_client     | utf8mb4                                       |
| character_set_connection | utf8mb4                                       |
| character_set_database   | utf8mb4                                       |
| character_set_filesystem | binary                                        |
| character_set_results    | utf8mb4                                       |
| character_set_server     | utf8                                          |
| character_set_system     | utf8                                          |
| character_sets_dir       | C:\Program Files\MariaDB 10.3\share\charsets\ |
+--------------------------+-----------------------------------------------+

and choking on typed-in query strings that include characters like the micro sign also continue.

TBH it's looking like either the Mariadb mysql client is not properly implementing the character set settings it's given, or Windows itself is screwing something up in the environment.

I may end up trying the whole thing on linux to try to determine whether it's the Mariadb side or the OS that is causing the problem.

Best Answer

When using the cmd prompt, set that window to use the code page chcp 65001:

The command "chcp" controls the "code page". chcp 65001 provides utf8, but it needs a special charset installed, too. some code pages. To set the font in the console window: Right-click on the title of the window → Properties → Font → pick Lucida Console

See also https://dev.mysql.com/doc/refman/5.6/en/mysql-tips.html#windows-unicode-support