Mysql – find 16,000 words and replace with ##

MySQLmysql-workbenchqueryreplaceupdate

I need to do a find a replace but rather than find one word to replace with another I have about 16k words, so as an example; table 1 has two words apple,orange – table 2 has the contents of a book, i want to be able to replace all the words apple and orange that exists in table 2 and replace them with ##, is there a query that does this ? my actual table 1 has about 16k words and table 2 has 8 million words. most of the examples on google suggests i type the words to find but that is not possible when i have 16k words I am using mysql workbench and an absolute beginner

(from comment) table 1 has 16k names in a column – table 2 has emails and text in columns – i want to be able to censor out all the names in table 2 using the names in table one as a reference

Best Answer

It won't be fast, but this may be part of what you need:

SELECT t2.*
    FROM t1
    JOIN t2  ON t2.email LIKE CONCAT('%', t1.name, '%')

That only identifies the rows that should be censored in the email, and only if the alignment of t1.name, etc, is good. That is, if t1.name is a substring of some other name, you could get some extra hits by this method.

Stick LIMIT 20 on that query and see what the output feels like.

Another approach might be:

FULLTEXT(email, text)  -- in t2

SELECT t2.* FROM t2 WHERE
    MATCH(email, text) AGAINST("all 16000 words from t1");

I have never seen such a big match-against, but perhaps it will work.

You could get a text copy of the list this way:

SET @@group_concat_max_len = 1234567;
SELECT GROUP_CONCAT(name SEPARATOR ' ') FROM t1;

and then copy the string of names into the above query.