MySQL REPLACE appears to corrupt data

corruptionMySQL

For reference, this issue was posted on StackOverflow.com before coming here.

In MySQL I am running multiple REPLACE queries to update hardcoded URLs in a database. Queries look like this:

UPDATE some_table
SET some_column = REPLACE(some_column, 'some_url', 'another_url')

The update process goes through every table and every string-based column in the database and runs a query like the above to ensure all potential occurrences are updated.

The system on which this is run also maintains some user content in the database. This content, which is (as far as I can tell) supposed to be unaffected by the update, becomes corrupted:

Before: Welcome to Company’s Learning
After : Welcome to Company’s Learning 

The corruption looks like a string that may contain a unicode character which is not replaced correctly during the REPLACE query.

All tables are InnoDB, and utf8_unicode_ci. The server runs MySQL 5.5.25, on IIS 7.5.

Any idea why this kind of corruption is taking place?

Best Answer

That does not look like a corruption problem. It might be a collation issue, not with the database, but with the IIS 7 or Moodle.

You did not specify if MySQL was running in Linux or Windows, so here it goes:

I ran queries to unveil collations and character sets

For MySQL 5.5.12 running on a Windows 7 machine, I get this:

mysql> show variables like '%coll%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | latin1_swedish_ci |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (0.00 sec)

mysql> show variables like '%char%';
+--------------------------+---------------------------------+
| Variable_name            | Value                           |
+--------------------------+---------------------------------+
| character_set_client     | latin1                          |
| character_set_connection | latin1                          |
| character_set_database   | latin1                          |
| character_set_filesystem | binary                          |
| character_set_results    | latin1                          |
| character_set_server     | latin1                          |
| character_set_system     | utf8                            |
| character_sets_dir       | C:\MySQL_5.5.12\share\charsets\ |
+--------------------------+---------------------------------+
8 rows in set (0.00 sec)

mysql>

For MySQL 5.5.9 running in Linux, I get this:

mysql> show variables like '%coll%';
+----------------------+-------------------+
| Variable_name        | Value             |
+----------------------+-------------------+
| collation_connection | utf8_general_ci   |
| collation_database   | latin1_swedish_ci |
| collation_server     | latin1_swedish_ci |
+----------------------+-------------------+
3 rows in set (1.75 sec)

mysql> show variables like '%char%';
+--------------------------+----------------------------+
| Variable_name            | Value                      |
+--------------------------+----------------------------+
| character_set_client     | utf8                       |
| character_set_connection | utf8                       |
| character_set_database   | latin1                     |
| character_set_filesystem | binary                     |
| character_set_results    | utf8                       |
| character_set_server     | latin1                     |
| character_set_system     | utf8                       |
| character_sets_dir       | /usr/share/mysql/charsets/ |
+--------------------------+----------------------------+
8 rows in set (1.69 sec)

mysql>

You have to make sure your connector software's character set/collation settings match the database it is connecting to. If the REPLACE ran on the DB Server, the REPLACE is a case-sensitive match. I could easily see a collation shift in the eyes of IIS or Moodle messing up a server-side string replace.

You may want to substitute this with doing a client side replace using PHP's str_replace