Mysql – Why number of matches in fulltext search get less when we put less strings

full-text-searchmyisamMySQL

SELECT * FROM tableauxiliary WHERE MATCH (FullTextSearch)
AGAINST ('resta*' IN NATURAL LANGUAGE MODE)

show 5 results

SELECT * FROM tableauxiliary WHERE MATCH (FullTextSearch)
AGAINST ('restaurant*' IN NATURAL LANGUAGE MODE)

shows 56k

What am I missing? How can something contains restaurant without containing resta

I am using mysql and MyISAM

Best Answer

Maybe more than 50% of your tablerows contain words starting with "resta*".
Words that are present in 50% or more of the rows are considered common and do not match.

So you would only have those rows returned, that have a non-common word also starting with "resta*" in your result.

see: http://dev.mysql.com/doc/refman/5.5/en/fulltext-natural-language.html

Try

SELECT * FROM tableauxiliary WHERE MATCH (FullTextSearch)
AGAINST ('resta*' IN BOOLEAN MODE)

see http://dev.mysql.com/doc/refman/5.6/en/fulltext-boolean.html

Boolean full-text searches have these characteristics: - They do not use the 50% threshold that applies to MyISAM search indexes

I got this here:
Why does MATCH() give Score 0, if there are less rows in the database?

Related Solutions

Sql-server – SQL Server Fulltext search against big amount of search terms over some period of time

Execution plan XML would be useful here but as a shot in the dark:

Daft as it may sound, try pushing either the [id] or date predicate into the CONTAINSTABLE query. See SQL Server 2005 Full-Text Queries on Large Catalogs: Lessons Learned - Consider embedding filter conditions as keywords in the indexed text.
I don't remember where but I recall reading an article or blog post some time ago that flagged big OR full-text queries as problematic. IIRC the suggested hack/workaround was to issue a UNION query instead.

For your example, the UNION for 2) would be along the lines of:

INSERT
    @processed_rules
    (
    rule_id
    , story_id
    )
SELECT
    @rule_id
  , s.[key]
FROM
   CONTAINSTABLE(Stories, (header, body), '"term one"')

UNION

SELECT
    @rule_id
  , s.[key]
FROM
   CONTAINSTABLE(Stories, (header, body), '"term two"')

UNION

  SELECT
    @rule_id
  , s.[key]
FROM
   CONTAINSTABLE(Stories, (header, body), '"term three"')

Mysql – why/how does the number of matched columns influences the way of excecuting a query

I can provide with a general explanation, but it may not apply specifically to your particular case:

The way decision making works is by evaluation cost of execution plan, then picking up what is hopefully the cheapest plan. This you already know.

When it comes to indexing, though, stuff are getting interesting. The way to evaluate the usefulness or viability of an index is to estimate the selectivity given some value.

For the moment, forget about your FULLTEXT index, and let's assume a simple index on some column col1, and another index on some column col2. Given the following two queries:

SELECT * FROM t WHERE col1 < 10 and col2 = 4;
SELECT * FROM t WHERE col1 BETWEEN 100 AND 110 and col2 = 4;

It may happen that the query is evaluated differently in these two cases. Why? Because it may happen that col2 = 4 returns more rows than col1 < 10, in which case we prefer to use index on col1. But then, it may return less rows than col1 BETWEEN 100 AND 110, in which case we prefer the index on col2.

Your case is not very much different. MySQL estimates the number of rows returned by some index query. When you use more columns, MySQL gets the impression your index is likely to result with few rows. So it chooses to start with TableA, then joins what should be very few rows with TableB.

But if MySQL believes the index to return many rows, it may prefer starting with TableB. Why is that? Because you are sorting on indexed columns of TableB. Sorting is a lot of work, too. So MySQL may choose to first sort the rows, then join to TableA and filter by fulltext index. It may not be a bad idea if the fulltext search yields with many rows anyhow.

Best Answer

Related Solutions

Sql-server – SQL Server Fulltext search against big amount of search terms over some period of time

Mysql – why/how does the number of matched columns influences the way of excecuting a query

Related Question