Mysql – What’s the best way to go about searching for short names

full-text-searchMySQL

I have a table that has CHAR in Korean for Soccer player names. All of these are under 12 characters, and while LIKE "%name%" works, Slight mistypes or variations don't work, and I've heard that LIKE isn't the fastest solution compared to something like Fulltext Indexes. I've also heard that the NGRAM parser works best with Korean/Japanese/Chinese characters, but it requires that I put in a title/body column; I only want to search for names, nothing else.

I currently have ~30k rows, and it won't grow more than ~45k. What's the best way to go about this?

Here's what I tried (after making an word index table in ngram):

...MATCH (name) AGAINST ('foo' IN NATURAL LANGUAGE MODE);

but it's not as precise as using LIKE, since it returns any word that's remotely connected to the query word.

Best Answer

A trick that sometimes is useful...

MATCH is usually the fastest way to search
MATCH has some drawbacks
LIKE and RLIKE are very slow
When using both MATCH and LIKE, MATCH is done first, thereby providing very few rows to recheck with LIKE:
```
WHERE MATCH(...) AGAINST(...)
  AND ... LIKE '%...%'
```

(I am not familiar with how NGRAM works, so I can't provide specifics there.)

From the 5.7.23 and 8.0.12 Changelogs: "The ngram full-text search parser permitted comma and period characters to be tokenized as words, which caused an inconsistency between boolean and natural language mode search results. Comma and period characters are no longer tokenized." (This implies that 5.7.23 has a fix for it.)

Related Solutions

Sql-server – Should SQL full text search match the value test2 when searching for test

Yes you can use full text search in this case.

Select * from your_table WHERE CONTAINS(your_column, '"test*"' );

will find test2.

See section

C. Using CONTAINS with prefix_term in msdn

Postgresql – slow queries – set weight to token type in tsquery – postgresql

Your ideal solution would probably be to write a custom, domain-data specific tsearch parser for your data, so it produced an ideal tsvector and tsquery for the input.

Alternatively, your search problem may be better suited to external tools like Apache Solr, which can work quite happily in conjunction with Pg.

When it comes to the "NoSQL" stuff (which NoSQL? Graph/tree? Key/value store? Object? Document? ...) it all tends to come down to details of your data model and how willing you are to write your own distributed search algorithms. You can often gain significant scale-out performance at the cost of considerably greater implementation effort.

Best Answer

Related Solutions

Sql-server – Should SQL full text search match the value test2 when searching for test

Postgresql – slow queries – set weight to token type in tsquery – postgresql

Related Question