Optimize Query with Indexes in MariaDB

mariadboptimization

The following query is looking for all towns that are between 5 and 10 kilometers from a hard-coded latitude/logitude (=another town actually). I have 37010 towns.

I'm using Symfony 2 which "creates" a query through Doctrine. This query is flagged by my MariaDB server as "not using indexes". I dont know what's going on, because:
(1) Here's the query:

SELECT
    v0_.id AS id0,
    v0_.nom AS nom1,
    v0_.url AS url2,
    v0_.cp AS cp3,
    v0_.insee AS insee4,
    ROUND(
        6371 *
        ACOS(COS(RADIANS(50.58907000)) *
        COS(RADIANS(v0_.lat)) *
        COS(RADIANS(v0_.lng) -
        RADIANS(3.16710500)) +
        SIN(RADIANS(50.58907000)) *
        SIN(RADIANS(v0_.lat))), 2
    ) AS sclr5
    FROM ville v0_
    HAVING sclr5 > 4 AND sclr5 <= 10
    ORDER BY sclr5 ASC LIMIT 20 OFFSET 0;

Here's the time:

# User@Host: x[x] @ localhost []
# Thread_id: 1514  Schema: mydatabase  QC_hit: No
# Query_time: 0.071503  Lock_time: 0.000137  Rows_sent: 20  Rows_examined: 37030

And here's the table:

MariaDB [mydatabase]> desc ville;
+-----------------+---------------------+------+-----+---------+----------------+
| Field           | Type                | Null | Key | Default | Extra          |
+-----------------+---------------------+------+-----+---------+----------------+
| id              | bigint(20) unsigned | NO   | PRI | NULL    | auto_increment |
| id_origine      | bigint(20) unsigned | YES  | MUL | NULL    |                |
| date_v_creation | datetime            | YES  | MUL | NULL    |                |
| date_v_debut    | datetime            | YES  | MUL | NULL    |                |
| date_v_fin      | datetime            | YES  | MUL | NULL    |                |
| article         | varchar(4)          | YES  |     |         |                |
| nom             | varchar(150)        | NO   | MUL |         |                |
| url             | varchar(150)        | NO   | MUL |         |                |
| cp              | varchar(10)         | NO   | MUL | NULL    |                |
| insee           | varchar(10)         | NO   |     |         |                |
| id_region       | bigint(20) unsigned | NO   | MUL | NULL    |                |
| id_departement  | bigint(20) unsigned | NO   | MUL | NULL    |                |
| lat             | decimal(15,8)       | NO   | MUL | NULL    |                |
| lng             | decimal(15,8)       | NO   |     | NULL    |                |
| sound           | varchar(252)        | YES  | MUL | NULL    |                |
+-----------------+---------------------+------+-----+---------+----------------+
15 rows in set (0.01 sec)

MariaDB [mydatabase]>

If I do an explain on this query i get:

+------+-------------+-------+------+---------------+------+---------+------+-------+----------------+
| id   | select_type | table | type | possible_keys | key  | key_len | ref  | rows  | Extra          |
+------+-------------+-------+------+---------------+------+---------+------+-------+----------------+
|    1 | SIMPLE      | v0_   | ALL  | NULL          | NULL | NULL    | NULL | 36510 | Using filesort |
+------+-------------+-------+------+---------------+------+---------+------+-------+----------------+

1 row in set (0.00 sec)

I have 37010 towns and MariaBN tells me Rows_examined: 37030. I dont get that. Anyway, how would you optimize it?

Best Answer

The problem is that you are only filtering by that large calculation so there is no choice but to do a full table scan and calculate that formula for every row.

You could limit the number of rows considered by adding an index on the lat and long columns and filter by an approximate square as well as the exact crow-flies calculation. That way the query runner should be able to seek using that index to find those in the larger area of the square ("x between n1 and n2 & y between n3 and n4" should use that index (an index seek followed by a partial scan)) then only need to lookup & read the full rows and do the calc & sort on those few matches to find the final filtered/limited set.

If you have the chance to consider changing database backend at this point in your project some (postgres with the PostGIS extensions for example) support special types and indexes for handling geometry data (even sometimes natively supporting lat/long based calculations), though that change may be overkill if this is the only part of your application that would make use of it.