MariaDB optimizing geo IP lookup with Maxmind GeoLite2 database

mariadbmariadb-10.5

I have the following two tables that I ingest with data from Maxmind's GeoLite2 Country database:

CREATE TABLE `country_location` (
  `geoname_id` int(11) UNSIGNED NOT NULL,
  `locale_code` varchar(255) DEFAULT NULL,
  `continent_code` char(2) DEFAULT NULL,
  `continent_name` varchar(255) DEFAULT NULL,
  `country_iso_code` char(2) DEFAULT NULL,
  `country_name` varchar(255) DEFAULT NULL,
  `is_in_european_union` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`geoname_id`),
  KEY `idx_country_iso_code` (`country_iso_code`),
  KEY `idx_continent_code` (`continent_code`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

CREATE TABLE `country_blocks` (
  `network` varchar(255) NOT NULL DEFAULT '',
  `network_start_integer` NUMERIC(39, 0) unsigned DEFAULT NULL,
  `network_last_integer` NUMERIC(39, 0) unsigned DEFAULT NULL,
  `geoname_id` int(11) UNSIGNED DEFAULT NULL,
  `registered_country_geoname_id` int(11) DEFAULT NULL,
  `represented_country_geoname_id` int(11) DEFAULT '0',
  `is_anonymous_proxy` tinyint(1) DEFAULT '0',
  `is_satellite_provider` tinyint(1) DEFAULT '0',
  PRIMARY KEY (`network_last_integer`),
  KEY `idx_network_start_integer` (`network_start_integer`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci;

I have been previously using the following query:

SELECT l.country_iso_code, l.continent_code
    FROM geo.country_location l
        JOIN geo.country_blocks b
    ON (l.geoname_id=b.geoname_id) 
    WHERE network_last_integer >= CONV(HEX(INET6_ATON('1.1.1.1')),16,10)
    ORDER BY network_last_integer LIMIT 1;

This query executes in less than ~1ms which is expected to be fast given the usage of only 1 key.

Unfortunately since MaxMind now adds some additional columns for geoname_id (registered_country_geoname_id, represented_country_geoname_id) there are certain cases (when geoname_id = 0) when the query would produce false positives returning the results of the next network from country_blocks for which the JOIN is triggered. In such cases I need to use registered_country_geoname_id instead of geoname_id (I don't care of represented_country_geoname_id so far).

I am trying to figure out a way of elegantly resolving that issue without any degradation of speed.

I tried the following but unfortunately it takes ~65ms which is quite slow (and explainable since now I am using two keys):

SELECT l.country_iso_code, l.continent_code
    FROM geo.country_location l
        JOIN geo.country_blocks b
        ON (l.geoname_id=b.geoname_id)
        OR (l.geoname_id=b.registered_country_geoname_id)
    WHERE network_last_integer >= CONV(HEX(INET6_ATON('1.1.1.1')),16,10)
    AND network_start_integer <= CONV(HEX(INET6_ATON('1.1.1.1')),16,10);

To optimize this and since selecting with IN with LIMIT from within other queries is not supported with the MariaDB 10.5 I ended up selecting the proper geoname_id first:

SELECT IF(geoname_id = 0, registered_country_geoname_id, geoname_id) as geoname_id
FROM geoip.country_blocks 
WHERE network_last_integer >= CONV(HEX(INET6_ATON('1.1.1.1')),16,10) 
ORDER BY network_last_integer LIMIT 1;

and then using the result to select within country_location.

This is quite fast since I am still using one key in country_blocks, but is there any more elegant way of achieving this?

~Cheers~

Best Answer

OR is inefficient. It can be turned into UNION. An example: http://mysql.rjweb.org/doc.php/index_cookbook_mysql#or

Why use INET6_ATON on IPV4 values?

Finding something in a range is poorly optimized, primarily because the Optimizer does not know whether there are overlapping ranges. One solution is given here: http://mysql.rjweb.org/doc.php/ipranges