Mysql – why/how does the number of matched columns influences the way of excecuting a query

full-text-searchinnodbmyisamMySQLperformance

Imagine the following situation:

Table A uses MyISAM and contains 4 fields (text) with a combined FULLTEXT-index.

FULLTEXT fulltext1 | fulltext2 | fulltext3 | fulltext4

Table B uses InnoDB and contains some other fields, where 5 of them are indexed as 'ORD'.

ORD order1 | order2 | order3 | order4 | order5

I like to do a full-text-search on tableA by joining tableB on it's foreign id and ordering the results by the indexed columns of tableB.


Query1 – match all 4 fulltext-columns:

SELECT `tableB`.`id`
FROM `tableA`
INNER JOIN `tableB` ON `tableA`.`tableB_id` = `tableB`.`id` 

WHERE MATCH (
`tableA`.`fulltext1`, `tableA`.`fulltext2`, `tableA`.`fulltext3`, `tableA`.`fulltext4`
)
AGAINST (
'+search*'
IN BOOLEAN
MODE
) 
ORDER BY
`tableB`.`order1` DESC,
`tableB`.`order2` DESC,
`tableB`.`order3` DESC,
`tableB`.`order4` DESC,
`tableB`.`order5` DESC
LIMIT 0,15

takes 1.6565 seconds.

EXPLAIN Query1:

select_type  table   type     possible_keys     key      key_len  ref   rows            Extra
SIMPLE       tableA  fulltext PRIMARY,FULLTEXT  FULLTEXT    0     1                     Using where; Using temporary; Using filesort
SIMPLE       tableB  eq_ref   PRIMARY           PRIMARY     4     db.tableA.tableB_id   1 

no index is used, temporary table was necessary. I don't know what "Extra 1" in second row means.


Query2 – match only 3 columns:

SELECT `tableB`.`id`
FROM `tableA`
INNER JOIN `tableB` ON `tableA`.`tableB_id` = `tableB`.`id` 

WHERE MATCH (
`tableA`.`fulltext1`, `tableA`.`fulltext2`, `tableA`.`fulltext3`
)
AGAINST (
'+search*'
IN BOOLEAN
MODE
) 
ORDER BY
`tableB`.`order1` DESC,
`tableB`.`order2` DESC,
`tableB`.`order3` DESC,
`tableB`.`order4` DESC,
`tableB`.`order5` DESC
LIMIT 0,15

takes 0.0114 seconds.

EXPLAIN Query2:

select_type table   type    possible_keys  key     key_len  ref           rows  Extra
SIMPLE      tableB  index   PRIMARY        ORD     783      NULL          15    Using index
SIMPLE      tableA  eq_ref  PRIMARY        PRIMARY 4        db.tableB.id  1     Using where

The order of the listed tables has changed. For tableB, the index was used, for tableA no temporary table was necessary.


It doesn't matter which fulltext-column I exclude from the query – as long as I match less than 4 of them, I get the duration and explanation of the showed Query2.
Maybe interesting: the amount of rows is equal in both tables, about 180k.

I'd like to know the reason why this behaviour occurs. I mean, it seems like the whole way of excecution would depend on the number of matched columns.

edit: now I'm completely confused. I deleted the fulltext-index. I matched all 4 (ex-)fulltext-columns (Query1). Now it takes 0.1205 seconds. EXPLAIN shows me that no temporary table is needed; but I wonder how I can match faster in Boolean Mode without having a fulltext-index.

Best Answer

I can provide with a general explanation, but it may not apply specifically to your particular case:

The way decision making works is by evaluation cost of execution plan, then picking up what is hopefully the cheapest plan. This you already know.

When it comes to indexing, though, stuff are getting interesting. The way to evaluate the usefulness or viability of an index is to estimate the selectivity given some value.

For the moment, forget about your FULLTEXT index, and let's assume a simple index on some column col1, and another index on some column col2. Given the following two queries:

SELECT * FROM t WHERE col1 < 10 and col2 = 4;
SELECT * FROM t WHERE col1 BETWEEN 100 AND 110 and col2 = 4;

It may happen that the query is evaluated differently in these two cases. Why? Because it may happen that col2 = 4 returns more rows than col1 < 10, in which case we prefer to use index on col1. But then, it may return less rows than col1 BETWEEN 100 AND 110, in which case we prefer the index on col2.

Your case is not very much different. MySQL estimates the number of rows returned by some index query. When you use more columns, MySQL gets the impression your index is likely to result with few rows. So it chooses to start with TableA, then joins what should be very few rows with TableB.

But if MySQL believes the index to return many rows, it may prefer starting with TableB. Why is that? Because you are sorting on indexed columns of TableB. Sorting is a lot of work, too. So MySQL may choose to first sort the rows, then join to TableA and filter by fulltext index. It may not be a bad idea if the fulltext search yields with many rows anyhow.