I am trying to join two tables on the criteria that the value one column of table1
is between the values of two columns of table2
.
Table 1
CREATE TABLE `values`(
`id` INT,
`name` VARCHAR(50),
`num_addr` BIGINT UNSIGNED
);
Table 2
CREATE TABLE `ranges`(
`id` INT,
`range_name` VARCHAR(50),
`range_start` BIGINT UNSIGNED,
`range_end` BIGINT UNSIGNED,
INDEX `idx_start` (`range_start` ASC),
INDEX `idx_end` (`range_end` ASC),
INDEX `idx_range` (`range_start` ASC, `range_end` ASC)
);
The query:
SELECT
`v`.`name`,
`v`.`num_addr`,
`r`.`range_name`
FROM
`values` `v`
LEFT JOIN `ranges` `r` ON `v`.`num_addr` BETWEEN `range_start` AND `range_end`
An EXPLAIN EXTENDED
for the query shows that no index is used, with the information 'Range checked for each record (index map: 0x7)'. This is a performance issue as the ranges table has over 500,000 rows and the query is time sensitive.
# id, select_type, table, type, possible_keys, key, key_len, ref, rows, filtered, Extra
1, SIMPLE, r, ALL, idx_start,idx_end,idx_range, , , , 1, 100.00, Range checked for each record (index map: 0x7)
A FORCE INDEX ON JOIN
actually makes things worse as the optimizer only sees the suggested index, but does not use it.
Is there a way to use an index on such joins?
Additional notes:
- Changing the
BETWEEN
tovalue >= range_start AND value <= range_end
does not change the execution plan. - Removing
idx_start
andidx_end
indexes does not improve the situation. - Adding an index to
num_addr
does not affect the join execution. - The intervals
(range_start, range_end)
can overlap. - There are
num_addr
that are not in a range. - A good analogy is phone numbers: In the
ranges
table there would be a record:('UK', 44000000000000, 44999999999999)
and another('UK Vodafone', 44700000000000, 44799999999999)
.
Best Answer
Overlapping ranges is an especially difficult problem to optimize. However, here is a technique that provides significant performance, but it requires major change to the schema.
Add another table; let's call it
Prefixes
. It contains two columns:Then, to look for the number
4432109...
, first look up4432
inPrefixes. This will lead to one or more
range.id` values for checking in your existing table.Note that, for your example, there will 100 entries in
Prefixes
for 'UK' and 10 entries for 'UK Vodafone'. This implies extra code to maintainPrefixes
.A variant on this is to move the rest of the
ranges
columns into my new table, and get rid of the tableranges
. (Whether to do this depends on the number and nature of the columns, the hassle in the code, the size of the 'prefix', the size of the table(s), etc, etc.)