I have a table that has CHAR
in Korean for Soccer player names. All of these are under 12 characters, and while LIKE "%name%"
works, Slight mistypes or variations don't work, and I've heard that LIKE
isn't the fastest solution compared to something like Fulltext Indexes. I've also heard that the NGRAM parser works best with Korean/Japanese/Chinese characters, but it requires that I put in a title/body
column; I only want to search for names, nothing else.
I currently have ~30k rows, and it won't grow more than ~45k. What's the best way to go about this?
Here's what I tried (after making an word index table in ngram
):
...MATCH (name) AGAINST ('foo' IN NATURAL LANGUAGE MODE);
but it's not as precise as using LIKE
, since it returns any word that's remotely connected to the query word.
Best Answer
A trick that sometimes is useful...
When using both MATCH and LIKE, MATCH is done first, thereby providing very few rows to recheck with LIKE:
(I am not familiar with how NGRAM works, so I can't provide specifics there.)
From the 5.7.23 and 8.0.12 Changelogs: "The ngram full-text search parser permitted comma and period characters to be tokenized as words, which caused an inconsistency between boolean and natural language mode search results. Comma and period characters are no longer tokenized." (This implies that 5.7.23 has a fix for it.)