Python MySQL UTF-8 – Handling Non-ASCII Characters

mysql-5.7pythonutf-8

I'm facing the following problem: I have a MySQL database which contains non-ascii characters in some records. Characters like "ñ", for example.

What kind of setup do I need to perform at the database, in order to deliver non-ascci to a Python application?

To give you more detail. Until now, I changed all the character_set_* and collation_* variables to UTF8 . But is still not working.. Before these modifications, the charset and collation were configured to latin-1.

MySQL

`mysql> SHOW VARIABLES LIKE "character%";
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | utf8                       |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | utf8                       |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+

8 rows in set (0.00 sec)

mysql> SHOW VARIABLES LIKE "collation%";
+----------------------+-----------------+
| Variable_name        | Value           |
+----------------------+-----------------+
| collation_connection | utf8_unicode_ci |
| collation_database   | utf8_unicode_ci |
| collation_server     | utf8_unicode_ci |
+----------------------+-----------------+
3 rows in set (0.00 sec)


mysql> SHOW FULL COLUMNS FROM tjs_stage.avisos WHERE  Collation="utf8_unicode_ci"\G;
*************************** 1. row ***************************
Field: Titulo
Type: varchar(32)
Collation: utf8_unicode_ci
Null: YES
Key: 
Default: NULL
Extra: 
Privileges: select,insert,update,references
Comment: 
*************************** 2. row ***************************
Field: Aviso
Type: varchar(256)
Collation: utf8_unicode_ci
Null: NO
Key: 
Default: NULL
Extra: 
Privileges: select,insert,update,references
Comment: 
2 rows in set (0.00 sec)


mysql> SELECT * FROM tjs_stage.avisos \G;
*************************** 1. row ***************************
Id: 1
Titulo: PRIMERO AVISO
Aviso: MAÑANA EMPIEZAM VACACIONES
Creacion: 2016-04-06 18:12:52
Modificacion: 2016-04-06 18:12:52
*************************** 2. row ***************************
Id: 49
Titulo: Aviso de Vacaciones
Aviso: Mañana empezam las vacaciones!
Creacion: 2016-05-26 21:46:34
Modificacion: 2016-05-26 21:46:34
2 rows in set (0.00 sec)`

Python

Here is the code which I developed just for test purposes:

`#!/usr/bin/python
# -*- coding: utf-8 -*-

import MySQLdb
import sys

try:
    con = MySQLdb.connect(host = "127.0.0.1", user = "root", passwd = "drootmysql", charset='utf8')  
    print "\nMySQL connection: OK\n"

except:
    print "\nConnection failed...\n"
    sys.exit()


cur = con.cursor()

insert_query = "INSERT INTO tjs_stage.avisos (Titulo,Aviso) VALUES('Aviso de Vacaciones','Mañana empezam las vacaciones!')"  
cur.execute(insert_query)
con.commit()

select_query = "SELECT * FROM tjs_stage.avisos"
cur.execute(select_query)
con.commit()

records = cur.fetchall()

for row in records:
    print row`

Here's the result:
`$ python mysql_json_data.py

MySQL connection: OK


(1L, 'Aviso de Vacaciones', 'Ma\xc3\xb1ana empezam las vacaciones!', datetime.datetime(2016, 5, 26, 16, 38, 40), datetime.datetime(2016, 5, 26, 16, 38, 40))  `

I just want to retrieve the data as it is from the database.
Is there any other setup that should perform in MySQL in order to server the non-ascii characters as it is ?

Thanks in advance guys!

Best Answer

Is mysql_json_data.py input or output or code or a dump?

'MA\xd1ANA... has Ñ encoded in latin1 (and perhaps escaped for display?)
'Ma\xc3\xb1ana... has ñ encoded in utf8 (and perhaps escaped for display?)

If you have one file with both in it, you have a mess.

The former will probably be truncated as you attempt to put it into your MySQL table. The latter will probably insert correctly.

See my notes on Python.