Mysql – LEFT JOIN on BETWEEN not using indexes

indexjoin;MySQLperformancequery-performance

I am trying to join two tables on the criteria that the value one column of table1 is between the values of two columns of table2.

Table 1

CREATE TABLE `values`(
  `id` INT,
  `name` VARCHAR(50),
  `num_addr` BIGINT UNSIGNED
);

Table 2

CREATE TABLE `ranges`(
  `id` INT,
  `range_name` VARCHAR(50),
  `range_start` BIGINT UNSIGNED,
  `range_end` BIGINT UNSIGNED,
  INDEX `idx_start` (`range_start` ASC),
  INDEX `idx_end` (`range_end` ASC),
  INDEX `idx_range` (`range_start` ASC, `range_end` ASC)
);

The query:

SELECT
  `v`.`name`,
  `v`.`num_addr`,
  `r`.`range_name`
FROM
  `values` `v`
  LEFT JOIN `ranges` `r` ON `v`.`num_addr` BETWEEN `range_start` AND `range_end`

An EXPLAIN EXTENDED for the query shows that no index is used, with the information 'Range checked for each record (index map: 0x7)'. This is a performance issue as the ranges table has over 500,000 rows and the query is time sensitive.

# id, select_type, table, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, r, ALL, idx_start,idx_end,idx_range, , , , 1, 100.00, Range checked for each record (index map: 0x7)

A FORCE INDEX ON JOIN actually makes things worse as the optimizer only sees the suggested index, but does not use it.

Is there a way to use an index on such joins?

Additional notes:

  1. Changing the BETWEEN to value >= range_start AND value <= range_end does not change the execution plan.
  2. Removing idx_start and idx_end indexes does not improve the situation.
  3. Adding an index to num_addr does not affect the join execution.
  4. The intervals (range_start, range_end) can overlap.
  5. There are num_addr that are not in a range.
  6. A good analogy is phone numbers: In the ranges table there would be a record: ('UK', 44000000000000, 44999999999999) and another ('UK Vodafone', 44700000000000, 44799999999999).

Best Answer

Overlapping ranges is an especially difficult problem to optimize. However, here is a technique that provides significant performance, but it requires major change to the schema.

Add another table; let's call it Prefixes. It contains two columns:

prefix DECIMAL(4,0) NOT NULL,
range_id INT,   -- for JOINing to `ranges`
PRIMARY KEY(prefix)

Then, to look for the number 4432109..., first look up 4432 in Prefixes. This will lead to one or morerange.id` values for checking in your existing table.

Note that, for your example, there will 100 entries in Prefixes for 'UK' and 10 entries for 'UK Vodafone'. This implies extra code to maintain Prefixes.

A variant on this is to move the rest of the ranges columns into my new table, and get rid of the table ranges. (Whether to do this depends on the number and nature of the columns, the hassle in the code, the size of the 'prefix', the size of the table(s), etc, etc.)