MySQL Performance – Special Function Slow in WHERE Clause but Not in SELECT

MySQLperformancequery-performance

I have kind of a special problem using a levenshtein algorithm in MySql.
But I don't think that it's special for levenshtein.
This query:

SELECT *,levenshtein(word,'Facbook') FROM Words WHERE length(word) between 6 and 8 and levenshtein(word,'Facbook') <=1 is pretty slow with around 5000 rows it takes about 1s.

But on the other hand this one:

SELECT *,levenshtein(word,'Facbook') FROM Words WHERE length(word) between 6 and 8 is kind of fast (0.02s).

The explain statments looks nearly identical only the number of rows differs a single bit.

The explain table for the first query:

 | id select_type table type possible_keys key key_len ref rows Extra                    |
 ----------------------------------------------------------------------------------------
 | 1  SIMPLE      Words index NULL         word   302  NULL 4711 Using where; Using index|

The second explain:

 | id  select_type table type   possible_keys key key_len ref rows Extra                  |
 ------------------------------------------------------------------------------------------
 | 1   SIMPLE      Words index  NULL          word 302    NULL 4621 Using where; Using index

Btw I have an index on the word column.
Now the question is why is the first one so much slower? It only has to use the second query and "print out" only the words that matching the second where clause: levenshtein(word,'Facbook') <=1 Right?

Strange:
I used PHPMyAdmin to check the time and now I used a simple php file to send the query and now both queries took 1s.
So maybe this question is totally bullshit…

Best Answer

Here's the difference between the two statements:

The first statement calls the levenshtein function only on the retrieved resultset.

The second statement calls the levenshtein and the length function as a so called "predicate" for every row in the table to retrieve the resultset. Even if levenshtein is only evaluated if the first condition is true, then there's the overhead of doing additional comparisons.

Thus, the second query is naturally slower.

IMHO this is a CPU bound problem. To make this query faster, either supply more predicates that can be evaluated (means: add more restrictions to the where clause), add more CPU power and try parallel selects (if mysql supports that).