Mysql fulltext search the.cnf optimization

full-text-searchmy.cnfmyisamMySQL

I have open a question on https://serverfault.com/questions/353888/mysql-full-text-search-cause-high-usage-cpu Some user recommended asking here.

We built a news site. Every day we will input tens of thousands data from web api.

In order to provide a precision search service, our table uses MyISAM, building a fulltext index (title, content, date). Our site is in testing on Godaddy VDS with 2GB RAM, 30GB space (No swap, because VDS do not allow to build swap). The CPU is Intel(R) Xeon(R) CPU L5609 @ 1.87GHz

After running a ./mysqltuner.pl

We get some results:

-------- General Statistics --------------------------------------------------
[--] Skipped version check for MySQLTuner script
[OK] Currently running supported MySQL version 5.5.20
[OK] Operating on 32-bit architecture with less than 2GB RAM

-------- Storage Engine Statistics -------------------------------------------
[--] Status: -Archive -BDB -Federated +InnoDB -ISAM -NDBCluster
[--] Data in MyISAM tables: 396M (Tables: 39)
[--] Data in InnoDB tables: 208K (Tables: 8)
[!!] Total fragmented tables: 9

-------- Security Recommendations  -------------------------------------------
[!!] User '@ip-XX-XX-XX-XX.ip.secureserver.net'
[!!] User '@localhost'

-------- Performance Metrics -------------------------------------------------
[--] Up for: 17h 27m 58s (1M q [20.253 qps], 31K conn, TX: 513M, RX: 303M)
[--] Reads / Writes: 61% / 39%
[--] Total buffers: 168.0M global + 2.7M per thread (151 max threads)
[OK] Maximum possible memory usage: 573.8M (28% of installed RAM)
[OK] Slow queries: 0% (56/1M)
[!!] Highest connection usage: 100%  (152/151)
[OK] Key buffer size / total MyISAM indexes: 8.0M/162.5M
[OK] Key buffer hit rate: 100.0% (2B cached / 882K reads)
[!!] Query cache is disabled
[OK] Sorts requiring temporary tables: 0% (0 temp sorts / 17K sorts)
[!!] Temporary tables created on disk: 49% (32K on disk / 64K total)
[!!] Thread cache is disabled
[!!] Table cache hit rate: 0% (400 open / 298K opened)
[OK] Open file limit used: 41% (421/1K)
[!!] Table locks acquired immediately: 77%
[OK] InnoDB data size / buffer pool: 208.0K/128.0M

-------- Recommendations -----------------------------------------------------
General recommendations:
    Run OPTIMIZE TABLE to defragment tables for better performance
    MySQL started within last 24 hours - recommendations may be inaccurate
    Enable the slow query log to troubleshoot bad queries
    Reduce or eliminate persistent connections to reduce connection usage
    When making adjustments, make tmp_table_size/max_heap_table_size equal
    Reduce your SELECT DISTINCT queries without LIMIT clauses
    Set thread_cache_size to 4 as a starting value
    Increase table_cache gradually to avoid file descriptor limits
    Optimize queries and/or use InnoDB to reduce lock wait
Variables to adjust:
    max_connections (> 151)
    wait_timeout (< 28800)
    interactive_timeout (< 28800)
    query_cache_size (>= 8M)
    tmp_table_size (> 16M)
    max_heap_table_size (> 16M)
    thread_cache_size (start at 4)
    table_cache (> 400)

And here is my.cnf

[mysqld]
port            = 3306
socket          = /tmp/mysql.sock
skip-external-locking
key_buffer_size = 256M
max_allowed_packet = 16M
max_connections = 1024
wait_timeout = 5
table_open_cache = 512
sort_buffer_size = 2M
read_buffer_size = 2M
read_rnd_buffer_size = 2M
myisam_sort_buffer_size = 128M
thread_cache_size = 8
query_cache_size= 256M
# Try number of CPU's*2 for thread_concurrency
thread_concurrency = 8
ft_min_word_len = 2
read_rnd_buffer_size=2M
tmp_table_size=128M

I am not sure how to optimization my.cnf depend on ./mysqltuner.pl return results.

Best Answer

I have an interesting surprise for you.

The only Optimizing for FullText Indexing you can do is not something at the my.cnf level. It is all about two things:

  1. The Stopword List
  2. The Query

STOPWORDS

There are 543 stopwords that you may or may not want filtered out of FULLTEXT indexes. The list of stopwords was built at compile time. You can override that list with your own list as follows:

OK, now let's create our stopword list. I usually set the English articles as the only stopwords.

echo "a"    > /var/lib/mysql/stopwords.txt
echo "an"  >> /var/lib/mysql/stopwords.txt
echo "the" >> /var/lib/mysql/stopwords.txt

Next, add the option to /etc/my.cnf plus allowing 1-letter, 2-letter, and 3 letter words

[mysqld]
ft_min_word_len=1
ft_stopword_file=/var/lib/mysql/stopwords.txt

Finally, restart mysql

service mysql restart

If you have any tables with FULLTEXT indexes already in place, you must drop those FULLTEXT indexes and create them again.

QUERY

Here is a little known fact about MySQL queries using a Full Table Index: There are occasions when the MySQL Query Optimizer stops using FULLTEXT indexes altogether and perform full table scans.

Here is an example:

use test
drop table if exists ft_test;
create table ft_test
(
    id int not null auto_increment,
    txt text,
    primary key (id),
    FULLTEXT (txt)
) ENGINE=MyISAM;
insert into ft_test (txt) values
('mount camaroon'),('mount camaron'),('mount camnaroon'),
('mount cameroon'),('mount cemeroon'),('mount camnaroon'),
('mount camraon'),('mount camaraon'),('mount camaran'),
('mount camnaraon'),('mount cameroan'),('mount cemeroan'),
('mount camnaraon'),('munt camraon'),('munt camaraon'),
('munt camaran'),('munt camnaraon'),('munt cameroan'),
('munt cemeroan'),('munt camnaraon'),('mount camraan');
select * from ft_test WHERE  MATCH(txt) AGAINST ("+mount +cameroon" IN BOOLEAN MODE);

Here is that sample data loaded:

mysql> use test
Database changed
mysql> drop table if exists ft_test;
Query OK, 0 rows affected (0.00 sec)

mysql> create table ft_test
    -> (
    ->     id int not null auto_increment,
    ->     txt text,
    ->     primary key (id),
    ->     FULLTEXT (txt)
    -> ) ENGINE=MyISAM;
Query OK, 0 rows affected (0.03 sec)

mysql> insert into ft_test (txt) values
    -> ('mount camaroon'),('mount camaron'),('mount camnaroon'),
    -> ('mount cameroon'),('mount cemeroon'),('mount camnaroon'),
    -> ('mount camraon'),('mount camaraon'),('mount camaran'),
    -> ('mount camnaraon'),('mount cameroan'),('mount cemeroan'),
    -> ('mount camnaraon'),('munt camraon'),('munt camaraon'),
    -> ('munt camaran'),('munt camnaraon'),('munt cameroan'),
    -> ('munt cemeroan'),('munt camnaraon'),('mount camraan');
Query OK, 21 rows affected (0.00 sec)
Records: 21  Duplicates: 0  Warnings: 0

mysql>

Here is a sample query and its EXPLAIN plan

mysql> select * from ft_test WHERE  MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE);
+----+----------------+
| id | txt            |
+----+----------------+
|  4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)

mysql> explain select * from ft_test WHERE  MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE)\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ft_test
         type: fulltext
possible_keys: txt
          key: txt
      key_len: 0
          ref:
         rows: 1
        Extra: Using where
1 row in set (0.00 sec)

mysql>

OK Great the FULLTEXT Index is used.

Now, let's change the query a slight bit

mysql> select * from ft_test WHERE  MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1;
+----+----------------+
| id | txt            |
+----+----------------+
|  4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)

mysql> explain select * from ft_test WHERE  MATCH(txt) AGAINST ("cameroon" IN BOOLEAN MODE) = 1\G
*************************** 1. row ***************************
           id: 1
  select_type: SIMPLE
        table: ft_test
         type: ALL
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 21
        Extra: Using where
1 row in set (0.00 sec)

mysql>

OMG What happened to the FULLTEXT index? The MySQL Query optimizer basically barfed at it. If you were performing a JOIN with the ft_test table, once the WHERE clause on the fulltext search is issued and it does the same then, who knows what on earth will happen to the rest of the query.

The solution would be to refactor the query ans attempt to isolate the FULLTEXT search and gather the keys only. Then LEFT JOIN those keys to the original table.

EXAMPLE

SELECT B.*
FROM (SELECT id from ft_test
WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
LEFT JOIN ft_test B USING (id);

For this query, here is the result and its EXPLAIN

mysql> SELECT B.*
    -> FROM (SELECT id from ft_test
    -> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
    -> LEFT JOIN ft_test B USING (id);
+----+----------------+
| id | txt            |
+----+----------------+
|  4 | mount cameroon |
+----+----------------+
1 row in set (0.00 sec)

mysql> explain SELECT B.*
    -> FROM (SELECT id from ft_test
    -> WHERE MATCH(txt) AGAINST ("+cameroon" IN BOOLEAN MODE)) A
    -> LEFT JOIN ft_test B USING (id)\G
*************************** 1. row ***************************
           id: 1
  select_type: PRIMARY
        table: <derived2>
         type: system
possible_keys: NULL
          key: NULL
      key_len: NULL
          ref: NULL
         rows: 1
        Extra:
*************************** 2. row ***************************
           id: 1
  select_type: PRIMARY
        table: B
         type: const
possible_keys: PRIMARY
          key: PRIMARY
      key_len: 4
          ref: const
         rows: 1
        Extra:
*************************** 3. row ***************************
           id: 2
  select_type: DERIVED
        table: ft_test
         type: fulltext
possible_keys: txt
          key: txt
      key_len: 0
          ref:
         rows: 1
        Extra: Using where
3 rows in set (0.00 sec)

mysql>

Notice that in the DERIVED2 part of the EXPLAIN plan, the FULLTEXT index was indeed used.

MORAL OF THE STORY

You will have to get into the habit of deciding how many stopwords your database will have, creating that stopword list, configuring it, and then create/recreate all FULLTEXT indexes. You must also get into the habit of refactoring your FULLTEXT search queries in such a way that the MySQL Query Optimizer does not generate a bad EXPLAIN plan or nullify indexes for the rest of the query participating in the EXPLAIN plan.