Mysql – Unexpected query execution plan on MySQL 8 when using Datetime range

MySQLmysql-8.0

I have a MySQL 8 database with about 900 millions of rows. When I run the following query the response is very slow because table B doesn't seem to be using the right index:

SELECT  B.name

FROM A JOIN B ON A.key1 = B.key1 AND A.key2 = B.key2
WHERE
    A.active = True
    AND B.datetime  > '2020-05-28 00:00:00' AND B.datetime  < '2020-05-29 00:00:00'
    AND A.type = 1

The query has the following execution plan (EXPLAIN FORMAT=JSON):

{
  "query_block": {
    "select_id": 1,
    "cost_info": {
      "query_cost": "10202799.44"
    },
    "nested_loop": [
      {
        "table": {
          "table_name": "A",
          "access_type": "ref",
          "possible_keys": [
            "ix_A_type_key1",
          ],
          "key": "ix_A_type_active",
          "used_key_parts": [
            "type",
            "active"
          ],
          "key_length": "5",
          "ref": [
            "const",
            "const"
          ],
          "rows_examined_per_scan": 62738,
          "rows_produced_per_join": 62738,
          "filtered": "100.00",
          "cost_info": {
            "read_cost": "62738.00",
            "eval_cost": "6273.80",
            "prefix_cost": "69011.80",
            "data_read_per_join": "1G"
          },
          "used_columns": [
            "active",
            "type",
            "key1",
            "key2"
          ]
        }
      },
      {
        "table": {
          "table_name": "B",
          "access_type": "ref",
          "possible_keys": [
            "ix_B_key1_datetime",
            "ix_B_key2_datetime"
          ],
          "key": "ix_B_key2_datetime",
          "used_key_parts": [
            "key2"
          ],
          "key_length": "258",
          "ref": [
            "A.key2"
          ],
          "rows_examined_per_scan": 147,
          "rows_produced_per_join": 3136,
          "filtered": "0.03",
          "cost_info": {
            "read_cost": "9211170.19",
            "eval_cost": "313.69",
            "prefix_cost": "10202799.44",
            "data_read_per_join": "96M"
          },
          "used_columns": [
            "key1",
            "key2",
            "datetime",
            "name",
          ],
          "attached_condition": "((`B`.`key1` = `A`.`key1`) and (`B`.`datetime` > TIMESTAMP'2020-05-28 00:00:00') and (`B`.`datetime` < TIMESTAMP'2020-05-29 00:00:00'))"
        }
      }
    ]
  }
}

Table B In the used_key_parts it's only using the column key2 instead of the full composite index key2 + datetime (ix_B_key2_datetime). Why is not using the datetime part of the composite index? Is there a way to speed up this query?

Best Answer

For a closer analysis we would need the table definitions for both tables and indexes as well as the row counts, but I think the problem is that you use key2 for joining and mysql can then not use it for filtering of the datetime (since the first part of the index, the key2, is different for the rows in the join result).

Can you try to create an index on datetime alone? Maybe optimize the query like this afterwards:

select sub.name
from A join 
(select name,key1,key2 from B where B.datetime between '2020-05-28 00:00:00' AND '2020-05-29 00:00:00') sub
on A.key1 = sub.key1 and A.key2 = sub.key2
where A.active = True
and A.type = 1;

This way it could use the most restrictive filtering first (which I assume will be the datetime column on table B) and join the few rows it has with the many in table A.