I can provide with a general explanation, but it may not apply specifically to your particular case:
The way decision making works is by evaluation cost of execution plan, then picking up what is hopefully the cheapest plan. This you already know.
When it comes to indexing, though, stuff are getting interesting. The way to evaluate the usefulness or viability of an index is to estimate the selectivity given some value.
For the moment, forget about your FULLTEXT index, and let's assume a simple index on some column col1
, and another index on some column col2
. Given the following two queries:
SELECT * FROM t WHERE col1 < 10 and col2 = 4;
SELECT * FROM t WHERE col1 BETWEEN 100 AND 110 and col2 = 4;
It may happen that the query is evaluated differently in these two cases. Why? Because it may happen that col2 = 4
returns more rows than col1 < 10
, in which case we prefer to use index on col1
. But then, it may return less rows than col1 BETWEEN 100 AND 110
, in which case we prefer the index on col2
.
Your case is not very much different. MySQL estimates the number of rows returned by some index query. When you use more columns, MySQL gets the impression your index is likely to result with few rows. So it chooses to start with TableA
, then joins what should be very few rows with TableB
.
But if MySQL believes the index to return many rows, it may prefer starting with TableB
. Why is that? Because you are sorting on indexed columns of TableB
. Sorting is a lot of work, too. So MySQL may choose to first sort the rows, then join to TableA
and filter by fulltext index. It may not be a bad idea if the fulltext search yields with many rows anyhow.
If I understand your question correctly and it isnt at all clear, you are trying to pull back information as far as what id's are connected with a value. I don't think you can just pull it from the index in PostgreSQL because the index will not contain visibility info, and so you will have a LOT of random IO and waiting for platters to turn.
The query for your test case is:
select thing, array_agg(id) from test group by thing;
Assuming you are on a version high enough to have array_agg.
In my system (9.1) this gives me:
chris=> select thing, array_agg(id) from test group by thing;
thing | array_agg
---------+-----------
'one' | {1,4}
'two' | {2,5}
'three' | {3}
(3 rows)
that's what you are looking for, right?
Best Answer
The easiest way is to get it from the MySQL Docs.
When you go to https://dev.mysql.com/doc/refman/5.5/en/fulltext-stopwords.html you will see the 543 words that are the default stopwords for MyISAM fulltext indexes. There is 'Copy to Clipboard' icon in the upper right corner of that list. Click on that.
You could then construct an SQL file to build that list.
EXAMPLE
I copied that to local text file in Ubuntu
The file looks like this:
I ran a loop across the file to tokenize it into another file.
Then, I jumped into
vi
and added SQL around the words and created this:NOTE: There are 47 words with an apostrophe in it (such as
ain't
andcan't
). So, I had to change them to double single quotes in the SQL file above.All you need to do is execute the aforementioned SQL. This would create the database
cust_stop
. with the tablestopwords
. The table has a single column calledVALUE
(as specified in the MariaDB documentation).After you created the stopword table, now configure MariaDB.conf with innodb_ft_server_stopword_table:
and restart MariaDB.
GIVE IT A TRY !!!
Note: If you are doing this in Galera, MariaDB will have a little fit because every table must have a primary key. You could either configure Galera not to be strict (be permissive) or just use the stopword file option instead.
Please see my old posts on making a text file stopword list.
UPDATE 2019-08-06 10:08 EDT
The link you gave to the MariaDB Documentation gives you the answer.
Just run
Unfortunately, MyISAM stopwords are not exposed to the information_schema like InnoDB stopwords are exposed. My answer simply provides a way to populate the INFORMATION_SCHEMA.INNODB_FT_DEFAULT_STOPWORD table if you wanted these stopwords exposed to InnoDB.
Please keep in mind that for InnoDB, 36 words is the "normal" table and is exposed. For MyISAM, the 543 word "normal" table is not exposed.
UPDATE 2019-08-06 12:43 EDT
If you are interested in knowing what stopwords are used in a MyISAM table with a fulltext index, then you will have to use myisam_ftdump. I have only referred to it once in this forum (mysql MATCH relavancy score).
I have it installed on my laptop for MySQL 5.5
When you run
myisam_ftdump -c
against a MyISAM table, you will get each word and its count. You will have to parse the words out of that output and compare each word to a text file that contains the 543 MyISAM stopwords. Any word in the Fulltext Dump that are in that list is using the stopword. I will leave that as an exercise for you.Please keep in mind that myisam_ftdump is a utility program, not a client program. What's the difference ??? A client program (such as mysql, mysqldump, mysqladmin) requires you to supply a username and password to access the table in mysqld. A utility program (such as mysaimpack, myisamchk) can be used against a table but no process should be accessing the MyISAM table (no open file handles, no running SQL). This means that you will have to copy the MyISAM table to another folder outside of datadir to use the myisam_ftdump against that MyISAM in the outside folder or you risk corrupting the table.
For the difference between client and utility programs, click the following:
UPDATE 2019-08-07 10:24 EDT
Here is a desperate query
This is an ugly Cartesian Join, the worst of its kind. It should locate a stopword at the beginning, end or middle of a text field.