Mysql – way to hint to query optimizer to MySQL which constraints should be done first

MySQLoptimizationsubquery

This is my current query:

SELECT BusinessID as ID,  
  111151.29341326*SQRT(pow(-6.186751-X(LatLong),2)+pow(106.772835-Y(LatLong),2)*0.98838574205337) AS Distance from
(
    SELECT *
    FROM
      tableauxiliary
    WHERE
      MBRContains(
    GeomFromText (
        'MULTIPOINT(-6.1934985598076 106.76604791159,-6.1800034401924 106.77962208841)'
        ),
        Latlong)=1  
    AND Prominent >15 
) AS TA

    Having Distance <= 18238
    ORDER BY
  Distance
LIMIT
  0, 45

Notice that they I used subquery. The reason why it uses subquery is because I want

      MBRContains(
    GeomFromText (
        'MULTIPOINT(-6.1934985598076 106.76604791159,-6.1800034401924 106.77962208841)'
        ),
        Latlong)=1   

to be done first. This reduces query time from 19 seconds to .9 seconds.

Is there a way to hint to mysql query optimizer so that I do not need to use subquery

Update:

I tried:

SELECT BusinessID as ID,  
  111151.29341326*SQRT(pow(-6.186751-X(LatLong),2)+pow(106.772835-Y(LatLong),2)*0.98838574205337) AS Distance from tableauxiliary 
USE Index (LatLong_2,FullTextSearch)
WHERE
    MBRContains(
    GeomFromText (
        'MULTIPOINT(-6.1934985598076 106.76604791159,-6.1800034401924 106.77962208841)'
        ),
        Latlong)  
    AND Prominent >15 
    AND MATCH FullTextSearch AGAINST ('a*' IN BOOLEAN MODE)
    ORDER BY
  Distance

Nothing change. If I use explain FullTextSearch is the only index used.

Note: The question is indeed about speed. What happen is I am sure that the query will run faster if mysql optimizer do spatial first before the fulltextsearch. This is because sometimes we are searching for a* in fulltext search that obviously don't limit by much.

Basically I have businesses database for the whole world and I am only interested in businesses within 500 meter square or 10km square at most. So obviously the spatial part will be far more restrictive.

While I am moving to mongodb, it'll be kind of nice if things work too for mysql.

Best Answer

I have rather bad news for you in this regard

MySQL Query Optimizer tends to stray away for further optimization once it sees a FULLTEXT index. I have written about this before in the StackExchange

You will have refactor the query

My suggestion would be to do the one of the following

SUGGESTION #1

  • Retrieve Business IDs suing the FULLTEXT Search
  • Perform GeoSpatial Search on the Retrieved IDs

SUGGESTION #2

  • Retrieve Business IDs using GeoSpatial Search (Since you said it take 0.9 seconds)
  • Perform the FULLTEXT Search on the GeoSpatial-Retrieved Businesses IDs

SUGGESTION #3

  • Retrieve Business IDs using GeoSpatial Search (Since you said it take 0.9 seconds) in Subquery1
  • Retrieve Business IDs using FULLTEXT Search in Subquery2
  • INNER JOIN Subquery1 and Subquery2

SUGGESTION #4

  • Retrieve Business IDs using GeoSpatial Search (Since you said it take 0.9 seconds)
  • DO NOT USE FullTextSerach. Use REGEXP against GeoSpatial-Retrieved Businesses IDs

Here is your original query:

SELECT BusinessID as ID,  
    111151.29341326*SQRT(pow(-6.186751-X(LatLong),2)+
    pow(106.772835-Y(LatLong),2)*0.98838574205337) AS Distance
from tableauxiliary 
USE Index (LatLong_2,FullTextSearch)
WHERE
    MBRContains(
    GeomFromText (
        'MULTIPOINT(-6.1934985598076 106.76604791159,-6.1800034401924 106.77962208841)'
        ),
        Latlong)  
    AND Prominent >15 
    AND MATCH FullTextSearch AGAINST ('a*' IN BOOLEAN MODE)
    ORDER BY
  Distance;

If the MySQL Query Optimizer detects a FULLTEXT search along with the conjunction AND followed by an expression within the same WHERE clause (thus, the same level of evaluation within an EXPLAIN plan), the MySQL Query Optimizer will take a vacation after working with the FULLTEXT search.

Perhaps you could try something like this (from SUGGESTION #1):

SELECT BusinessID as ID,  
    111151.29341326*SQRT(pow(-6.186751-X(LatLong),2)+
    pow(106.772835-Y(LatLong),2)*0.98838574205337) AS Distance
FROM
(
    SELECT BusinessID,LatLong
    FROM
    (
        SELECT BusinessID,Prominent,LatLong
        From tableauxiliary WHERE
        MATCH FullTextSearch AGAINST ('a*' IN BOOLEAN MODE)
    ) AA
    WHERE Prominent > 15
) A
WHERE
    MBRContains(
    GeomFromText (
        'MULTIPOINT(-6.1934985598076 106.76604791159,-6.1800034401924 106.77962208841)'
        ),
        Latlong)  
ORDER BY
    Distance
;

Notice you do not see the conjunction AND. Please check the EXPLAIN plan for this. The query should do the following:

  • Gather BusinessID,Prominent,LatLong That Matched the FULLTEXT search
  • Remove BusinessID,LatLong Whose Prominent value is < 15
  • Check Distance of remaining BusinessID values
  • sort the Check Distance results by Distance

UPDATE 2012-08-08 11:30 EDT

As stated earlier, the FULLTEXT index seems to abandon all hope of using other indexes. Perhaps two subqueries that are JOINed (SUGGESTION #3)

SELECT BusinessID as ID,  
    111151.29341326*SQRT(pow(-6.186751-X(LatLong),2)+
    pow(106.772835-Y(LatLong),2)*0.98838574205337) AS Distance
FROM
(
    SELECT BusinessID,Prominent From tableauxiliary WHERE
    MATCH FullTextSearch AGAINST ('a*' IN BOOLEAN MODE)
) A
INNER JOIN
(
    SELECT BusinessID From tableauxiliary WHERE
    MBRContains(
    GeomFromText (
        'MULTIPOINT(-6.1934985598076 106.76604791159,-6.1800034401924 106.77962208841)'
        ),
        Latlong)
) B
USING (BusinessID)
WHERE Prominent > 15
ORDER BY
    Distance
;

You may want to abandon using FULLTEXT searching altogether and try regular expressions (SUGGESTION #4)

SELECT B.BusinessID as ID,  
    111151.29341326*SQRT(pow(-6.186751-X(B.LatLong),2)+
    pow(106.772835-Y(B.LatLong),2)*0.98838574205337) AS Distance
FROM
(
    SELECT BusinessID From tableauxiliary WHERE
    MBRContains(
    GeomFromText (
        'MULTIPOINT(-6.1934985598076 106.76604791159,-6.1800034401924 106.77962208841)'
        ),
        Latlong)
) A INNER JOIN tableauxiliary B USING (BusinessID)
WHERE B.Prominent > 15 AND
((B.FullTextSearch REGEXP ' a*') OR (B.FullTextSearch REGEXP '^a*'))
ORDER BY
    Distance
;