Mysql – 1 hour query, is there another way to do this

join;MySQLperformancequery-performance

I saw a couple questions similar to this but none could answer my question, really.

I have two tables, and I am trying to find those rows where a match is found.
user has about 350,000 entries and match has about 55,000 entries. There are UNIQUE indices on both user.name as well as on match.string.

My query is:

SELECT user.name
FROM user
INNER JOIN match
on ( 
      (match.type = 'exact' AND user.name = match.string)
   OR (match.type = 'begin' AND user.name LIKE CONCAT(match.string, '%'))
   OR (match.type = 'end' AND user.name LIKE CONCAT('%', match.string))
)

The query is running for an hour now, and no end in sight.
Is there a faster way to do that?

I don't need this for regular production, but I think a query that runs for so long is unacceptable.

Best Answer

No way! Every 'end' is not optimizable because of the leading wildcard. That means that if there are, say, a mere 3K 'end' entries in match, then there will be over a billion (350K * 3K) tests to perform!

The query can be partially optimized by

INDEX(type, string) -- in `match`
INDEX(name) -- in `user`
SELECT ... FROM user JOIN match ON match.type='exact' AND user.name=match.string
UNION ALL
SELECT ... FROM user JOIN match ON match.type='begin' AND user.name LIKE CONCAT...
UNION ALL
SELECT ... FROM user JOIN match ON match.type='end' AND user.name LIKE CONCAT...;
  • The match INDEX minimizes the effort in finding the relevant rows.
  • The user INDEX is useful for the first two SELECTs.
  • The UNION lets the optimizer operate on each SELECT independently.
  • The ALL avoid a de-dupping phase that is (I assume) unnecessary.

Have you considered a FULLTEXT index? If it works, it will be a million times as fast. (Perhaps literally a million!)

Here's a kludge that can make the third SELECT sargable (gee, I learned a new word!) -- Reverse both the match.string and the user.name, then do the LIKE like 'begin'. No, don't use the REVERSE function in this query, actually build the two tables (or maybe a separate pair of tables) with the reversed strings.