MySQL Replace — Query or in Shell

MySQL

I have numerous tables with some funky character replacement going on. I found some information on running an update to replace these, but it's in numerous tables and numerous fields.

update <table_name>
set <field> = replace(<field>,'’','\'');
update <table_name>
set <field>= replace(<field>,'…','...');
update <table_name>
set <field>= replace(<field>,'–','-');
update <table_name>
set <field>= replace(<field>,'“','"');
update <table_name>
set <field>= replace(<field>,'â€','"');
update <table_name>
set <field>= replace(<field>,'‘','\'');
update <table_name>
set <field>= replace(<field>,'•','-');
update <table_name>
set <field>= replace(<field>,'‡','c');

Do you all have any advice on running this across every field for all the tables? I thought about doing a mysql dump and using sed to replace everything, but I'm not sure how to structure the commands since they're all special characters.

Any help would be greatly appreciated.

Best Answer

You could try to query the table information_schema.columns and find every table in your MySQL DB Instance that have character fields. Here is the table's layout:

mysql> show create table information_schema.columns\G
*************************** 1. row ***************************
       Table: COLUMNS
Create Table: CREATE TEMPORARY TABLE `COLUMNS` (
  `TABLE_CATALOG` varchar(512) NOT NULL DEFAULT '',
  `TABLE_SCHEMA` varchar(64) NOT NULL DEFAULT '',
  `TABLE_NAME` varchar(64) NOT NULL DEFAULT '',
  `COLUMN_NAME` varchar(64) NOT NULL DEFAULT '',
  `ORDINAL_POSITION` bigint(21) unsigned NOT NULL DEFAULT '0',
  `COLUMN_DEFAULT` longtext,
  `IS_NULLABLE` varchar(3) NOT NULL DEFAULT '',
  `DATA_TYPE` varchar(64) NOT NULL DEFAULT '',
  `CHARACTER_MAXIMUM_LENGTH` bigint(21) unsigned DEFAULT NULL,
  `CHARACTER_OCTET_LENGTH` bigint(21) unsigned DEFAULT NULL,
  `NUMERIC_PRECISION` bigint(21) unsigned DEFAULT NULL,
  `NUMERIC_SCALE` bigint(21) unsigned DEFAULT NULL,
  `CHARACTER_SET_NAME` varchar(32) DEFAULT NULL,
  `COLLATION_NAME` varchar(32) DEFAULT NULL,
  `COLUMN_TYPE` longtext NOT NULL,
  `COLUMN_KEY` varchar(3) NOT NULL DEFAULT '',
  `EXTRA` varchar(27) NOT NULL DEFAULT '',
  `PRIVILEGES` varchar(80) NOT NULL DEFAULT '',
  `COLUMN_COMMENT` varchar(1024) NOT NULL DEFAULT ''
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.01 sec)

Here is a query to get every character-based column type for all user-defined tables:

mysql> select DISTINCT COLUMN_TYPE from information_schema.columns
    -> WHERE table_schema NOT IN ('mysql','information_schema','performance_schema')
    -> AND (COLUMN_TYPE REGEXP 'text$'
    -> OR COLUMN_TYPE REGEXP '^varchar'
    -> OR COLUMN_TYPE REGEXP '^char');
+--------------+
| COLUMN_TYPE  |
+--------------+
| mediumtext   |
| varchar(512) |
| varchar(20)  |
| longtext     |
| text         |
| varchar(200) |
| varchar(255) |
| varchar(100) |
| tinytext     |
| varchar(25)  |
| varchar(64)  |
| varchar(32)  |
| varchar(60)  |
| varchar(50)  |
| varchar(250) |
| varchar(150) |
| varchar(10)  |
| varchar(45)  |
| char(5)      |
+--------------+
19 rows in set (0.09 sec)

You could join that query to a second query. The second query would have the name of every table with those column types:

SELECT BB.table_schema,BB.table_name,BB.COLUMN_NAME FROM
(select DISTINCT COLUMN_TYPE from information_schema.columns
WHERE table_schema NOT IN ('mysql','information_schema','performance_schema')
AND (COLUMN_TYPE REGEXP 'text$'
OR COLUMN_TYPE REGEXP '^varchar'
OR COLUMN_TYPE REGEXP '^char')) AA
INNER JOIN
(SELECT table_schema,table_name,COLUMN_NAME,COLUMN_TYPE
FROM information_schema.columns
WHERE table_schema NOT IN
('mysql','information_schema','performance_schema')) BB
USING (COLUMN_TYPE);

Now, message the output to make it write a generic script for you:

SELECT CONCAT('UPDATE ',db,'.',tb,' SET ',fld,
'=REPLACE(',fld,',''oldchar'',''newchar'');')
UpdateCommand
FROM (SELECT db,tb,fld FROM
(
    SELECT table_schema db,table_name tb,COLUMN_NAME fld FROM
    (select DISTINCT COLUMN_TYPE from information_schema.columns
    WHERE table_schema NOT IN ('mysql','information_schema','performance_schema')
    AND (COLUMN_TYPE REGEXP 'text$'
    OR COLUMN_TYPE REGEXP '^varchar'
    OR COLUMN_TYPE REGEXP '^char')) AAA
    INNER JOIN
    (SELECT table_schema,table_name,COLUMN_NAME,COLUMN_TYPE
    FROM information_schema.columns
    WHERE table_schema NOT IN
    ('mysql','information_schema','performance_schema')) BBB
    USING (COLUMN_TYPE)
) AA) A;

Output should look something like this:

+-----------------------------------------------------------------------------------------------+
| UpdateCommand                                                                                 |
+-----------------------------------------------------------------------------------------------+
| UPDATE example.user SET user_name=REPLACE(user_name,'oldchar','newchar');                     |
| UPDATE garbage.rolando SET post_content=REPLACE(post_content,'oldchar','newchar');            |
| UPDATE garbage.rolando SET post_title=REPLACE(post_title,'oldchar','newchar');                |
| UPDATE garbage.rolando SET post_excerpt=REPLACE(post_excerpt,'oldchar','newchar');            |
| UPDATE garbage.rolando SET post_status=REPLACE(post_status,'oldchar','newchar');              |
| UPDATE garbage.rolando SET comment_status=REPLACE(comment_status,'oldchar','newchar');        |
| UPDATE garbage.rolando SET ping_status=REPLACE(ping_status,'oldchar','newchar');              |
| UPDATE garbage.rolando SET post_password=REPLACE(post_password,'oldchar','newchar');          |
| UPDATE garbage.rolando SET post_name=REPLACE(post_name,'oldchar','newchar');                  |

Last step for generic script generation is to output the query to a text file like this:

mysql -u... -p... -A --skip-column-names -e"SELECT CONCAT('UPDATE ',db,'.',tb,' SET ',fld, '=REPLACE(',fld,',''oldchar'',''newchar'');') UpdateCommand FROM (SELECT db,tb,fld FROM (     SELECT table_schema db,table_name tb,COLUMN_NAME fld FROM     (select DISTINCT COLUMN_TYPE from information_schema.columns     WHERE table_schema NOT IN ('mysql','information_schema','performance_schema')     AND (COLUMN_TYPE REGEXP 'text$'     OR COLUMN_TYPE REGEXP '^varchar'     OR COLUMN_TYPE REGEXP '^char')) AAA     INNER JOIN     (SELECT table_schema,table_name,COLUMN_NAME,COLUMN_TYPE     FROM information_schema.columns     WHERE table_schema NOT IN     ('mysql','information_schema','performance_schema')) BBB USING (COLUMN_TYPE) ) AA) A;" > CharUpdateScript.sql

You can customize the generic script from there.