Mysql – Why is LIKE more than 4x faster than MATCH…AGAINST on a FULLTEXT index in MySQL

full-text-searchMySQL

I'm not getting this.

I've got a table with these indexes

PRIMARY     post_id
INDEX       topic_id
FULLTEXT    post_text

Table has (only) 346 000 rows. I am trying to perform 2 queries.

SELECT post_id 
FROM phpbb_posts 
WHERE topic_id = 144017 
AND post_id != 155352 
AND MATCH(post_text) AGAINST('http://rapidshare.com/files/5494794/photo.rar')

takes 4.05 seconds while

SELECT post_id 
FROM phpbb_posts 
WHERE topic_id=144017 
AND post_id != 155352 
AND post_text LIKE ('%http://rapidshare.com/files/5494794/photo.rar%')

takes 0.027 seconds.

EXPLAIN shows that the only difference is in possible_keys (fulltext has post_text included, LIKE does not)

That's really strange.

What's behind this? What's happening in the background? How can LIKE be so fast when not using index and FULLTEXT so slow when using its index?

UPDATE1:

Actually it now takes about 0.5 seconds, maybe table was locked, but still, when I turn on profiling it's show that FULLTEXT INITIALIZATION took 0.2seconds. What's up?

I can query my table with LIKE 10x a second, with fulltext only 2x

UPDATE2:

Surprise!

mysql> SELECT post_id FROM phpbb_posts WHERE post_id != 2 AND topic_id = 6 AND MATCH(post_text) AGAINST ('rapidshare.com');
Empty set (0.04 sec)

so I'm asking, how is this possible?

Additionally,

SELECT count(*) FROM phpbb_posts WHERE MATCH(post_text) AGAINST ('rapidshare.com')

is really slow. Can be fulltext any broken?

UPDATE3:

What the hell?

SELECT forum_id, post_id, topic_id, post_text  FROM phpbb_posts  WHERE MATCH(post_text) AGAINST ('rapidshare.com') LIMIT 0, 30;

takes 0.27s while

SELECT count(*) FROM phpbb_posts  WHERE MATCH(post_text) AGAINST ('rapidshare.com') LIMIT 0, 30;

takes more than 30 seconds! What is going wrong here?

Best Answer

I think the problem may stem from the presence of the FULLTEXT index itself.

Every time there is a query involving a FULLTEXT index, the MySQL Query Optimizer tends to whack the query into a full table scan. I have seen this over the years. I also wrote an earlier post about this most trifling behavior in FULLTEXT indexes.

You may need to do two things:

  1. refactor the query so that the FULLTEXT index does not throw the MySQL Query Optimizer into a state of confusion
  2. Add an additional index that will properly support the refactored query

REFACTOR THE QUERY

Here is your original query

SELECT post_id  
FROM phpbb_posts  
WHERE topic_id = 144017  
AND post_id != 155352  
AND MATCH(post_text) AGAINST('http://rapidshare.com/files/5494794/photo.rar') 

You will need to refactor the query like this:

SELECT subqueryA.post_id
FROM
(
    SELECT post_id FROM phpbb_posts
    WHERE topic_id = 144017
    AND post_id != 155352
) subqueryA
INNER JOIN
(
    SELECT post_id FROM phpbb_posts
    WHERE MATCH(post_text) AGAINST('http://rapidshare.com/files/5494794/photo.rar')
) subqueryB
USING (post_id);

CREATE A NEW INDEX

You will need an index to support subqueryA. You already have an index on topic_id. You need to replace it as follows:

ALTER TABLE phpbb_posts ADD INDEX topic_post_ndx (topic_id,post_id);
ALTER TABLE phpbb_posts DROP INDEX topic_id;

Give it a Try !!!

UPDATE 2012-03-19 13:08 EDT

Try this one first

SELECT post_id FROM
(
    SELECT * FROM phpbb_posts
    WHERE topic_id = 144017
    AND post_id != 155352
) A;

If this runs fast and return a small number of rows, then try this nested subquery:

SELECT post_id FROM
(
    SELECT * FROM phpbb_posts
    WHERE topic_id = 144017
    AND post_id != 155352
) A
WHERE MATCH(post_text) AGAINST('http://rapidshare.com/files/5494794/photo.rar');

UPDATE 2012-03-19 13:11 EDT

Compare the running time of this:

SELECT count(*) FROM phpbb_posts  WHERE MATCH(post_text) AGAINST ('rapidshare.com') LIMIT 0, 30;

with this

SELECT count(*) FROM phpbb_posts WHERE 1 = 1;

If there running time is the same, then the MATCH clause is being executed on every row. As I mentioend earlier, using FULLTEXT indexes tends to nullify any benefits attempted and contributed by the MySQL Query Optimizer.