Mysql – WHERE IN subquery too slow

existsMySQLoptimizationwhere

I have a FULLTEXT search on the database, with the WHERE IN condition which checks that search results exists in another table. The query takes more than 1 hour to complete (companies table has 67k rows, and employees table has 82k rows).

Desired result: get search results for "company ABC" in table A only if we have employees data for that company in table B.

The query I have is:

SELECT *,
        MATCH (name, keywords, description) AGAINST ('+searchKeyword*') as relevance,
        (SELECT COUNT(customerID) FROM customers WHERE MATCH (name, keywords, description) AGAINST ('+searchKeyword*' IN BOOLEAN MODE) AND customerID IN(SELECT customerID FROM customersPersonelData)) as totalResults
FROM customers
WHERE MATCH (name, keywords, description) AGAINST ('+searchKeyword*' IN BOOLEAN MODE)
    AND customerID IN(SELECT customerID FROM customersPersonelData)
ORDER BY relevance DESC, name ASC
LIMIT 0,10

For testing I removed "relevance" and "totalResults" subqueries, and this query then takes 72 seconds:

SELECT *
FROM customers
WHERE MATCH (name, keywords, description) AGAINST ('+searchKeyword*' IN BOOLEAN MODE)
    AND customerID IN(SELECT customerID FROM customersPersonelData)
ORDER BY name ASC
LIMIT 0,10

Query explanation:
enter image description here

I tried using EXIST and JOIN, the result is the same.
How can I speed this query up?

Best Answer

If I understand correctly, you are getting count to satisfy your Pagination requirements. Generally, two separate queries are executed; one to get the 10 records, and another to get the total count.

The problem in your approach is that you are calculating the Count in a subquery inside the SELECT clause. For every row returned by the SELECT clause, is going to be executing the same count again and again. You can move the count calculation part to a Derived Table instead (Subquery inside the FROM clause), and CROSS JOIN it.

Another noteworthy thing is that MySQL is (unfortunately) generally inefficient when optimizing subqueries inside the IN(...) condition. You can convert this into EXISTS based Correlated Subquery instead (JOIN may result in Duplicates).

SELECT c.*,
       MATCH (c.name, c.keywords, c.description) AGAINST ('+searchKeyword*') as relevance,
       dt.totalResults         
FROM customers c 
JOIN customersPersonelData cpd ON cpd.customerID = c.customerID 
CROSS JOIN 
( 
  SELECT COUNT(c1.customerID) AS totalResults
  FROM customers c1
  WHERE MATCH (c1.name, c1.keywords, c1.description) AGAINST ('+searchKeyword*' IN BOOLEAN MODE) 
    AND EXISTS (SELECT 1 FROM customersPersonelData cpd1 
                WHERE cpd1.customerID = c1.customerID)
) dt
WHERE 
  MATCH (c.name, c.keywords, c.description) AGAINST ('+searchKeyword*' IN BOOLEAN MODE) 
    AND EXISTS (SELECT 1 FROM customersPersonelData cpd 
                WHERE cpd.customerID = c.customerID)
ORDER BY relevance DESC, c.name ASC
LIMIT 0,10

For good performance, you would need an index on customerID in customersPersonelData table. Also, a composite Full Text Index should exists on (name, keywords, description) columns in the customers table.

Now, if I were you, and this was a Pagination problem, I would look at this article to rethink how pagination should be done: https://mariadb.com/kb/en/library/pagination-optimization/