Mysql – Issue with Hebrew characters in thesql

character-setMySQL

Lately i have migrated some schemas from mysql 4 to mysql 5.6. Everything works fine except one issue: Hebrew characters appears in Gibberish.

Here is the current configuration:

SHOW VARIABLES LIKE 'char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | hebrew                     |
| character_set_connection | hebrew                     |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | hebrew                     |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

select * from INFORMATION_SCHEMA.CHARACTER_SETS where CHARACTER_SET_NAME='Hebrew';
+--------------------+----------------------+-------------------+--------+
| CHARACTER_SET_NAME | DEFAULT_COLLATE_NAME | DESCRIPTION       | MAXLEN |
+--------------------+----------------------+-------------------+--------+
| hebrew             | hebrew_general_ci    | ISO 8859-8 Hebrew |      1 |
+--------------------+----------------------+-------------------+--------+

select * from INFORMATION_SCHEMA.COLLATIONS where CHARACTER_SET_NAME='Hebrew';
+-------------------+--------------------+----+------------+-------------+---------+
| COLLATION_NAME    | CHARACTER_SET_NAME | ID | IS_DEFAULT | IS_COMPILED | SORTLEN |
+-------------------+--------------------+----+------------+-------------+---------+
| hebrew_general_ci | hebrew             | 16 | Yes        | Yes         |       1 |
| hebrew_bin        | hebrew             | 71 |            | Yes         |       1 |
+-------------------+--------------------+----+------------+-------------+---------+

My dump file contains the following statement for example:

Create Table: CREATE TABLE `table` (
) ENGINE=InnoDB DEFAULT CHARSET=hebrew

I tried to change it to the following, which didn't help:

Create Table: CREATE TABLE `table` (
) ENGINE=InnoDB CHARACTER SET utf8 COLLATE utf8_general_ci

Values for example:

select manufacturer, hex(manufacturer) from cars LIMIT 5;
+--------------+-------------------+
| manufacturer | hex(manufacturer) |
+--------------+-------------------+
| ������  | F1E5E1E0F8E5      |
| ������  | F1E5E1E0F8E5      |
| ������  | F1E5E1E0F8E5      |
| ����      | E0E5E3E9          |
| ����      | E0E5E3E9          |
+--------------+-------------------+

Any suggestions? What could be the issue?

Best Answer

Analysis

Thanks for the updates; it makes things clear.

F1E5E1E0F8E5, is סובארו when interpreted in CHARACTER SET hebrew.

CREATE TABLE `table` (
) ENGINE=InnoDB DEFAULT CHARSET=hebrew

is not sufficient -- It is possible for each VARCHAR column to override the CHARSET. I'll assume that you have not done that.

Since the HEX in the table is consistent with CHARSET hebrew, I will assume the input side of things is consistent.

Solution 1

The "black diamonds" are coming from the output tool. What tool are you using? Looks like the mysql commandline tool? That needs to know how to display the data. By default (I think), it interprets the bytes it gets (from the SELECT) as being UTF-8. But they are not. Are you running on Windows or *nix? I don't know the details of setting it to Hebrew, but that is probably the solution.

Solution 2

Another solution is to start your 'mysql' session with SET NAMES utf8; (or utf8mb4). That will announce that you want the bytes in the client to be utf8. As the Hebrew chracters are coming from the table, they will be converted to utf8, and will (I think, without trying it) dislayed correctly. Meanwhile, the table will continue to be storing F1E5 E1E0 F8E5, etc.

HTML

If you will be displaying on web pages, keep in mind that browsers make guesses on what the encoding is. You could help them by either doing Solution 2 so that they get UTF-8, which they will readily assume, or you can stick with ISO 8859-8 and help them with <meta charset=...> and <form accept-charset...>` tags.

Please respond with the details of how you choose to solve the issue(s). This is a scenario that is somewhat unusual.