I don't get full-text-search working as I want it to, and I don't understand the differences in the resultlists.
Example statements:
SELECT `meldungstext`
FROM `artikel`
WHERE `meldungstext` LIKE '%punkt%'
returns 92 rows. I receive rows which have matches, for example, like "Punkten", "Zwei-Punkte-Vorsprung" and "Treffpunkt" in column meldungstext.
I set a fulltext-index on the column "meldungstext" and tried this:
SELECT `meldungstext`
FROM `artikel`
WHERE MATCH (`meldungstext`)
AGAINST ('*punkt*')
this returns only 8 rows. I receive only rows which have matches to "Punkt" itself or words which I think are taken as "Punkt" as in "i-Punkt".
I then tried boolean mode:
SELECT `meldungstext`
FROM `artikel`
WHERE MATCH (`meldungstext`)
AGAINST ('*punkt*' IN BOOLEAN MODE)
returns 44 rows. I receive rows which have "Zwei-Punkte-Vorsprung" or "Treffpunkt" in column meldungstext, but not those with "Punkten".
Why does this happen and how can I set a "fully" working full-text-search to prevent using LIKE '%%' in the where-clause?
Best Answer
I took the three strings in your question and added it to a table plus three more string with
pankt
instead ofpunkt
.The following was executed using MySQL 5.5.12 for Windows
I ran these queries against the table using 3 different approaches
MATCH ... AGAINST
LOCATE
as in the LOCATE functionLIKE
Please note the differences
All the PunktMatch values should bee 3 1's and 3 0's.
Now watch me query them as normal
OK using MATCH .. AGAINST with punkt does not work. What about pankt ???
Let's run my big
GROUP BY
query against panktThis is wrong also because I should see 3 0's and 3 1's for PanktMatch.
I tried something else
I added a plus sign to pankt and I got different results. What 2 and not 3 ???
According to the MySQL Documentation, notice what it says about the wildcard character:
Based on this, the wildcard character is applicable for the back of tokens and not for the front. In light of this, the output must be correct because 2 of the 3 punkt's start tokens. Same story with pankt. This at least explains why 2 out of 3 and why less rows.