Mysql – Remove all special characters without removing accented characters

MySQL

I need to clean up a column that has all types of weird characters with in the string data. The weird characters are pretty much these []!@#$%.&*`~^-_{}:;<>/\|()

I only need to allow [0-9], [a-z] but also accented characters due to some spanish words being included.

I've been looking at https://dev.mysql.com/doc/refman/8.0/en/regexp.html to try to devise a regex that might fit this but my regex foo is weak.

Is there anyway to clean these up without having to look for and replace the special characters individually while still leaving the accented characters?

Best Answer

MySQL 8 has the REGEXP_REPLACE function that should work.

If you only need to leave alphanumeric characters, including accented characters, this would be simply

SELECT REGEXP_REPLACE(your_column, '[^[:alnum:]]+', ' ') ...

to replace any non-alphanumerics with spaces. If you want to only eliminate characters on your list, you'd use something like

SELECT REGEXP_REPLACE(your_column, '[\]\\[!@#$%.&*`~^_{}:;<>/\\|()-]+', ' ') ...

Note that in the latter case you will need to escape ']', '[', '\', and possibly '-' (or put the latter as the first or last element in the set).

dbfiddle link.