I have an interesting surprise for you.
The only Optimizing for FullText Indexing you can do is not something at the my.cnf level. It is all about two things:
- The Stopword List
- The Query
STOPWORDS
There are 543 stopwords that you may or may not want filtered out of FULLTEXT indexes. The list of stopwords was built at compile time. You can override that list with your own list as follows:
OK, now let's create our stopword list. I usually set the English articles as the only stopwords.
echo "a" > /var/lib/mysql/stopwords.txt
echo "an" >> /var/lib/mysql/stopwords.txt
echo "the" >> /var/lib/mysql/stopwords.txt
Next, add the option to /etc/my.cnf plus allowing 1-letter, 2-letter, and 3 letter words
[mysqld]
ft_min_word_len=1
ft_stopword_file=/var/lib/mysql/stopwords.txt
Finally, restart mysql
service mysql restart
If you have any tables with FULLTEXT indexes already in place, you must drop those FULLTEXT indexes and create them again.
QUERY
Here is a little known fact about MySQL queries using a Full Table Index: There are occasions when the MySQL Query Optimizer stops using FULLTEXT indexes altogether and perform full table scans.
Here is an example:
use test
drop table if exists ft_test;
create table ft_test
(
id int not null auto_increment,
txt text,
primary key (id),
FULLTEXT (txt)
) ENGINE=MyISAM;
insert into ft_test (txt) values
('mount camaroon'),('mount camaron'),('mount camnaroon'),
('mount cameroon'),('mount cemeroon'),('mount camnaroon'),
('mount camraon'),('mount camaraon'),('mount camaran'),
('mount camnaraon'),('mount cameroan'),('mount cemeroan'),
('mount camnaraon'),('munt camraon'),('munt camaraon'),
('munt camaran'),('munt camnaraon'),('munt cameroan'),
('munt cemeroan'),('munt camnaraon'),('mount camraan');
select * from ft_test WHERE MATCH(txt) AGAINST ("+mount +cameroon" IN BOOLEAN MODE);
Here is that sample data loaded:
mysql> use test
Database changed
mysql> drop table if exists ft_test;
Query OK, 0 rows affected (0.00 sec)
mysql> create table ft_test
-> (
-> id int not null auto_increment,
-> txt text,
-> primary key (id),
-> FULLTEXT (txt)
-> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.03 sec)
mysql> insert into ft_test (txt) values
-> ('mount camaroon'),('mount camaron'),('mount camnaroon'),
-> ('mount cameroon'),('mount cemeroon'),('mount camnaroon'),
-> ('mount camraon'),('mount camaraon'),('mount camaran'),
-> ('mount camnaraon'),('mount cameroan'),('mount cemeroan'),
-> ('mount camnaraon'),('munt camraon'),('munt camaraon'),
-> ('munt camaran'),('munt camnaraon'),('munt cameroan'),
-> ('munt cemeroan'),('munt camnaraon'),('mount camraan');
Query OK, 21 rows affected (0.00 sec)
Records: 21 Duplicates: 0 Warnings: 0
mysql>
Here is a sample query and its EXPLAIN plan
mysql> select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE);
+----+----------------+
| id | txt |
+----+----------------+
| 4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)
mysql> explain select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE)\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ft_test
type: fulltext
possible_keys: txt
key: txt
key_len: 0
ref:
rows: 1
Extra: Using where
1 row in set (0.00 sec)
mysql>
OK Great the FULLTEXT Index is used.
Now, let's change the query a slight bit
mysql> select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1;
+----+----------------+
| id | txt |
+----+----------------+
| 4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)
mysql> explain select * from ft_test WHERE MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1\G
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: ft_test
type: ALL
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 21
Extra: Using where
1 row in set (0.00 sec)
mysql>
OMG What happened to the FULLTEXT index? The MySQL Query optimizer basically barfed at it. If you were performing a JOIN with the ft_test table, once the WHERE clause on the fulltext search is issued and it does the same then, who knows what on earth will happen to the rest of the query.
The solution would be to refactor the query ans attempt to isolate the FULLTEXT search and gather the keys only. Then LEFT JOIN those keys to the original table.
EXAMPLE
SELECT B.*
FROM (SELECT id from ft_test
WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
LEFT JOIN ft_test B USING (id);
For this query, here is the result and its EXPLAIN
mysql> SELECT B.*
-> FROM (SELECT id from ft_test
-> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
-> LEFT JOIN ft_test B USING (id);
+----+----------------+
| id | txt |
+----+----------------+
| 4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)
mysql> explain SELECT B.*
-> FROM (SELECT id from ft_test
-> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
-> LEFT JOIN ft_test B USING (id)\G
*************************** 1. row ***************************
id: 1
select_type: PRIMARY
table: <derived2>
type: system
possible_keys: NULL
key: NULL
key_len: NULL
ref: NULL
rows: 1
Extra:
*************************** 2. row ***************************
id: 1
select_type: PRIMARY
table: B
type: const
possible_keys: PRIMARY
key: PRIMARY
key_len: 4
ref: const
rows: 1
Extra:
*************************** 3. row ***************************
id: 2
select_type: DERIVED
table: ft_test
type: fulltext
possible_keys: txt
key: txt
key_len: 0
ref:
rows: 1
Extra: Using where
3 rows in set (0.00 sec)
mysql>
Notice that in the DERIVED2 part of the EXPLAIN plan, the FULLTEXT index was indeed used.
MORAL OF THE STORY
You will have to get into the habit of deciding how many stopwords your database will have, creating that stopword list, configuring it, and then create/recreate all FULLTEXT indexes. You must also get into the habit of refactoring your FULLTEXT search queries in such a way that the MySQL Query Optimizer does not generate a bad EXPLAIN plan or nullify indexes for the rest of the query participating in the EXPLAIN plan.
Best Answer
Yes, you can do that using
ts_rank
:You can use
setweight
to give different parts of the text search vector different weight:Then the
title
will be ranked higher (weightA
) than thecontent
(weightC
).