Mysql – Optimized MySQL “optional” FULLTEXT search on multiple tables

full-text-searchMySQLoptimization

I need to select data with optional columns in where clause based on client input.
my query is like this:

SELECT a.id, a.title, b.txt AS b_txt, c.txt AS c_txt 
FROM a 
LEFT JOIN b ON a.b_id=b.id 
LEFT JOIN c ON a.c_id=c.id 
where a.status=1 
AND (0=@bid OR a.b_id=@bid) 
AND (0=@cid OR a.c_id=@cid) 

@bid and @cid are client inputs, the values must be >0 with default to 0 if client does not provide the input.

client can provide no inputs so it will select all data where status=1 ignoring b_id and c_id columns

client can provide @bid or @cid or both and the query will select accordingly.

table engine is InnoDB, columns are indexed, foreign key and relationships set.

so far everything is fine. EXPLAIN SQL shows selection is done based on indexes provided.

now I need to add full-text search on all 3 tables to query is where I have problems. text search is also optional if client provides any keyword to search or not.

fulltext index is defined for a.title, b.txt and c.txt

I changed the query to:

SELECT a.id, a.title, b.txt AS b_txt, c.txt AS c_txt 
FROM a 
LEFT JOIN b ON a.b_id=b.id 
LEFT JOIN c ON a.c_id=c.id 
where a.status=1 
AND ('0'=@keywords OR (MATCH(a.title) AGAINST(@keywords IN BOOLEAN MODE) OR MATCH(b.txt) AGAINST(@keywords IN BOOLEAN MODE) OR MATCH(c.txt) AGAINST(@keywords IN BOOLEAN MODE))) 
AND (0=@bid OR a.b_id=@bid) 
AND (0=@cid OR a.c_id=@cid) 

the query seems to be working returning the results I want but the explain query returns type=ALL and a full table scan so no the query is not working in an optimized way.

if I change the OR between matches to AND then explain query returns type=fulltext and select is done on FULLTEXT index, but I need the OR style.

I was thinking about joining different result sets but couldn't find out how since the input is optional and there might be no input and therefore no full-text search needed to be done.

any solutions?

EDIT:

OK with thanks to jkavalik's comment and Rick's answer it seems I need to add some clarifications:

actually I am using a data service from WSO2 Data Services Server. so I am just passing the input parameters to the data service and I can not generate the select query based on user input. (there's an option to pass part of a query string as input parameter but I am not planning to do that for security reasons)

so I have two options:

  1. write separate queries which accept different sets of input parameters for each case of input parameter combinations. deciding to send the parameters to which query based on user inputs. well the optional inputs being 10+ in a real situation, that will be lots of queries and don't seem to be the right solution. optional fields may change over time, maintenance in future is expensive. doesn't looks like the standard way to handle this.

  2. write a single query in a way it can handle the optional fields.

I prefer option 2 unless there is a third option I am not aware of?

about the @variables they are either being replaced by input parameter's value or the default value (0 in this example) before query being passed to and executed in MySQL. and actually they are :variable not @variable but I changed it to @ in this example for better reading or understandings.

Best Answer

Wrong approach. Instead, "construct" the SELECT based on which fields the user provides.

Why? OR does a poor job of optimizing. If, instead, you remove the clause from WHERE, you don't need the test for 0, thereby avoiding the OR.

Worse yet is an OR spanning multiple tables, as you need to to with the FULLTEXT checks. The only way for the query, as it stands, to be run is to evaluate all combinations of all rows in all tables, then start filtering. (OK, I exaggerate, but not by much.)

A common way to avoid OR is to turn it into UNION:

( SELECT id FROM a WHERE MATCH(a...) ... )
UNION DISTINCT
( SELECT id FROM a JOIN b ON ... WHERE MATCH(b...) ... )
...

That will get all the possible values for a.id.

Why LEFT? That will slow down the 2nd and 3rd parts of the subquery for no good reason. Remove LEFT. If there is no b row, that's equivalent to the MATCH failing.

Then, put that in a subquery to get the rest of the desired data:

SELECT a.id, a.title, b.txt AS b_txt, c.txt AS c_txt 
FROM ( the above "union" query ) AS tmp
JOIN a 
LEFT JOIN b ON a.b_id=b.id 
LEFT JOIN c ON a.c_id=c.id;

In this case you might need LEFT; but remove it if you don't.

Other notes

While you are 'constructing' the query, you may as well plug in the @ values. @variables inhibit the optimizer in some cases; actual literals are more easily processed. Be sure to escape them to help prevent "sql injection".

In case it was not obvious, 0=@bid OR goes away.