I took the three strings in your question and added it to a table plus three more string with pankt
instead of punkt
.
The following was executed using MySQL 5.5.12 for Windows
mysql> CREATE TABLE artikel
-> (
-> id INT NOT NULL AUTO_INCREMENT,
-> meldungstext MEDIUMTEXT,
-> PRIMARY KEY (id),
-> FULLTEXT (meldungstext)
-> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.03 sec)
mysql> INSERT INTO artikel (meldungstext) VALUES
-> ('Punkten'),('Zwei-Punkte-Vorsprung'),('Treffpunkt'),
-> ('Pankten'),('Zwei-Pankte-Vorsprung'),('Treffpankt');
Query OK, 6 rows affected (0.00 sec)
Records: 6 Duplicates: 0 Warnings: 0
mysql>
I ran these queries against the table using 3 different approaches
MATCH ... AGAINST
LOCATE
as in the LOCATE function
LIKE
Please note the differences
mysql> SELECT id,meldungstext,
-> COUNT(IF(MATCH (`meldungstext`) AGAINST ('*punkt*' IN BOOLEAN MODE),1,0)) PunktMatch,
-> IF(LOCATE('punkt',meldungstext)>0,1,0) PunktLocate,
-> meldungstext LIKE '%punkt%' PunktLike
-> FROM `artikel` GROUP BY id,meldungstext;
+----+-----------------------+------------+-------------+-----------+
| id | meldungstext | PunktMatch | PunktLocate | PunktLike |
+----+-----------------------+------------+-------------+-----------+
| 1 | Punkten | 1 | 1 | 1 |
| 2 | Zwei-Punkte-Vorsprung | 1 | 1 | 1 |
| 3 | Treffpunkt | 1 | 1 | 1 |
| 4 | Pankten | 1 | 0 | 0 |
| 5 | Zwei-Pankte-Vorsprung | 1 | 0 | 0 |
| 6 | Treffpankt | 1 | 0 | 0 |
+----+-----------------------+------------+-------------+-----------+
6 rows in set (0.01 sec)
mysql>
All the PunktMatch values should bee 3 1's and 3 0's.
Now watch me query them as normal
mysql> SELECT `meldungstext` FROM `artikel`
-> WHERE MATCH (`meldungstext`) AGAINST ('*punkt*' IN BOOLEAN MODE);
+-----------------------+
| meldungstext |
+-----------------------+
| Zwei-Punkte-Vorsprung |
| Punkten |
+-----------------------+
2 rows in set (0.01 sec)
mysql> SELECT `meldungstext` FROM `artikel`
-> WHERE LOCATE('punkt',meldungstext)>0;
+-----------------------+
| meldungstext |
+-----------------------+
| Punkten |
| Zwei-Punkte-Vorsprung |
| Treffpunkt |
+-----------------------+
3 rows in set (0.00 sec)
mysql> SELECT `meldungstext` FROM `artikel`
-> WHERE `meldungstext` LIKE '%punk%';
+-----------------------+
| meldungstext |
+-----------------------+
| Punkten |
| Zwei-Punkte-Vorsprung |
| Treffpunkt |
+-----------------------+
3 rows in set (0.00 sec)
mysql>
OK using MATCH .. AGAINST with punkt does not work. What about pankt ???
mysql> SELECT `meldungstext` FROM `artikel` WHERE `meldungstext` LIKE '%pankt%';
+-----------------------+
| meldungstext |
+-----------------------+
| Pankten |
| Zwei-Pankte-Vorsprung |
| Treffpankt |
+-----------------------+
3 rows in set (0.00 sec)
mysql>
Let's run my big GROUP BY
query against pankt
mysql> SELECT id,meldungstext,
-> COUNT(IF(MATCH (`meldungstext`) AGAINST ('*pankt*' IN BOOLEAN MODE),1,0)) PanktMatch,
-> IF(LOCATE('pankt',meldungstext)>0,1,0) PanktLocate,
-> meldungstext LIKE '%pankt%' PanktLike
-> FROM `artikel` GROUP BY id,meldungstext;
+----+-----------------------+------------+-------------+-----------+
| id | meldungstext | PanktMatch | PanktLocate | PanktLike |
+----+-----------------------+------------+-------------+-----------+
| 1 | Punkten | 1 | 0 | 0 |
| 2 | Zwei-Punkte-Vorsprung | 1 | 0 | 0 |
| 3 | Treffpunkt | 1 | 0 | 0 |
| 4 | Pankten | 1 | 1 | 1 |
| 5 | Zwei-Pankte-Vorsprung | 1 | 1 | 1 |
| 6 | Treffpankt | 1 | 1 | 1 |
+----+-----------------------+------------+-------------+-----------+
6 rows in set (0.01 sec)
mysql>
This is wrong also because I should see 3 0's and 3 1's for PanktMatch.
I tried something else
mysql> SELECT id,meldungstext, MATCH (`meldungstext`) AGAINST ('+*pankt*' IN BOOLEAN MODE) PanktMatch, IF(LOCATE('pankt',meldungstext)>0,1,0) PanktLocate, meldungstext LIKE '%pankt%' PanktLike FROM `artikel` GROUP BY id,meldungstext;
+----+-----------------------+------------+-------------+-----------+
| id | meldungstext | PanktMatch | PanktLocate | PanktLike |
+----+-----------------------+------------+-------------+-----------+
| 1 | Punkten | 0 | 0 | 0 |
| 2 | Zwei-Punkte-Vorsprung | 0 | 0 | 0 |
| 3 | Treffpunkt | 0 | 0 | 0 |
| 4 | Pankten | 1 | 1 | 1 |
| 5 | Zwei-Pankte-Vorsprung | 1 | 1 | 1 |
| 6 | Treffpankt | 0 | 1 | 1 |
+----+-----------------------+------------+-------------+-----------+
6 rows in set (0.00 sec)
mysql>
I added a plus sign to pankt and I got different results. What 2 and not 3 ???
According to the MySQL Documentation, notice what it says about the wildcard character:
*
The asterisk serves as the truncation (or wildcard) operator. Unlike
the other operators, it should be appended to the word to be affected.
Words match if they begin with the word preceding the * operator.
If a word is specified with the truncation operator, it is not
stripped from a boolean query, even if it is too short (as determined
from the ft_min_word_len setting) or a stopword. This occurs because
the word is not seen as too short or a stopword, but as a prefix that
must be present in the document in the form of a word that begins with
the prefix. Suppose that ft_min_word_len=4. Then a search for '+word
+the*' will likely return fewer rows than a search for '+word +the':
The former query remains as is and requires both word and the* (a word
starting with the) to be present in the document.
The latter query is transformed to +word (requiring only word to be
present). the is both too short and a stopword, and either condition
is enough to cause it to be ignored.
Based on this, the wildcard character is applicable for the back of tokens and not for the front. In light of this, the output must be correct because 2 of the 3 punkt's start tokens. Same story with pankt. This at least explains why 2 out of 3 and why less rows.
I can provide with a general explanation, but it may not apply specifically to your particular case:
The way decision making works is by evaluation cost of execution plan, then picking up what is hopefully the cheapest plan. This you already know.
When it comes to indexing, though, stuff are getting interesting. The way to evaluate the usefulness or viability of an index is to estimate the selectivity given some value.
For the moment, forget about your FULLTEXT index, and let's assume a simple index on some column col1
, and another index on some column col2
. Given the following two queries:
SELECT * FROM t WHERE col1 < 10 and col2 = 4;
SELECT * FROM t WHERE col1 BETWEEN 100 AND 110 and col2 = 4;
It may happen that the query is evaluated differently in these two cases. Why? Because it may happen that col2 = 4
returns more rows than col1 < 10
, in which case we prefer to use index on col1
. But then, it may return less rows than col1 BETWEEN 100 AND 110
, in which case we prefer the index on col2
.
Your case is not very much different. MySQL estimates the number of rows returned by some index query. When you use more columns, MySQL gets the impression your index is likely to result with few rows. So it chooses to start with TableA
, then joins what should be very few rows with TableB
.
But if MySQL believes the index to return many rows, it may prefer starting with TableB
. Why is that? Because you are sorting on indexed columns of TableB
. Sorting is a lot of work, too. So MySQL may choose to first sort the rows, then join to TableA
and filter by fulltext index. It may not be a bad idea if the fulltext search yields with many rows anyhow.
Best Answer
You just need to link back to the Product table and sort by some other criteria on there. You can ask the Full Text engine to rank the results, but that really just ranks how closely the result matches their search, which is not what you are looking for. I have a sample here using MS SQL Server.
Since you didn't provide a schema, I put together something that would show what I mean.