Mysql – need help in Query optimization

mysql-5.5

I need help in query optimization,I am having table with structure

Create Table: CREATE TABLE `ip_country_mapping` (
  `id` bigint(20) NOT NULL AUTO_INCREMENT,
  `cron` bigint(20) NOT NULL,
  `start_ip_number` bigint(20) NOT NULL,
  `end_ip_number` bigint(20) NOT NULL,
  `country` varchar(2) COLLATE utf8_bin NOT NULL,
  `state` varchar(30) COLLATE utf8_bin DEFAULT NULL,
  `city` varchar(30) COLLATE utf8_bin DEFAULT NULL,
  `zip` varchar(30) COLLATE utf8_bin DEFAULT NULL,
  `creation_date` datetime NOT NULL,
  `last_updation_date` datetime NOT NULL,
  PRIMARY KEY (`id`),
  UNIQUE KEY `UK_IP_COUNTRY_MAPPING_TEMP_START_IP_NUMBER` (`start_ip_number`),
  UNIQUE KEY `UK_IP_COUNTRY_MAPPING_TEMP_END_IP_NUMBER` (`end_ip_number`),
  KEY `FK_IP_COUNTRY_MAPPING_TEMP_CRON` (`cron`),
  KEY `ind_ipscan` (`end_ip_number`,`start_ip_number`),
  CONSTRAINT `FK_IP_COUNTRY_MAPPING_TEMP_CRON` FOREIGN KEY (`cron`) REFERENCES `cron` (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2020168 DEFAULT CHARSET=utf8 COLLATE=utf8_bin
1 row in set (0.00 sec)

The query to optimized is

SELECT ipcountrym0_.country as col_0_0_, ipcountrym0_.state as col_1_0_,ipcountrym0_.city as col_2_0_ 
FROM ip_country_mapping ipcountrym0_ 
WHERE ipcountrym0_.start_ip_number<=1376791568 
AND ipcountrym0_.end_ip_number>=1376791568;

EXPLAIN plan is giving the output as

EXPLAIN SELECT ipcountrym0_.country as col_0_0_, ipcountrym0_.state as col_1_0_, ipcountrym0_.city as col_2_0_ 
FROM ip_country_mapping ipcountrym0_ 
WHERE ipcountrym0_.start_ip_number<=1376791568 
AND ipcountrym0_.end_ip_number>=1376791568;
    *************************** 1. row ***************************
               id: 1
      select_type: SIMPLE
            table: ipcountrym0_
             type: ALL
    possible_keys: UK_IP_COUNTRY_MAPPING_TEMP_START_IP_NUMBER,UK_IP_COUNTRY_MAPPING_TEMP_END_IP_NUMBER,ind_ipscan
              key: NULL
          key_len: NULL
              ref: NULL
             rows: 2081584
            Extra: Using where

If i add limit 1 to the above query,it scans 199999 rows.The output generated by this query is only one row.if i put order by clause in the above query with limit 1 then it scans only 2 rows.

EXPLAIN SELECT ipcountrym0_.country as col_0_0_, ipcountrym0_.state as col_1_0_, ipcountrym0_.city as col_2_0_ 
FROM ip_country_mapping ipcountrym0_ 
WHERE ipcountrym0_.start_ip_number<=1376791568 
AND countrym0_.end_ip_number>=1376791568 
ORDER BY id LIMIT 1;
        *************************** 1. row ***************************
                   id: 1
          select_type: SIMPLE
                table: ipcountrym0_
                 type: index
        possible_keys: UK_IP_COUNTRY_MAPPING_TEMP_START_IP_NUMBER,UK_IP_COUNTRY_MAPPING_TEMP_END_IP_NUMBER,ind_ipscan
                  key: PRIMARY
              key_len: 8
                  ref: NULL
                 rows: 2
                Extra: Using where
        1 row in set (0.00 sec)

Is this the correct way my query is optimizing or there is some thing wrong the way i had tried to optimize.
My Manager is are not agreeing with this optimization as he consider that the optimizer is first scanning the index column id and then the start ip number and end ip number which is not relevant.

Can somebody please explain how optimizer is working here and is this correct way the optimizing the query .It is argued that optimizer is showing wrong plan.

Best Answer

Your manager seems to be right. Your second query (with the ORDER BY) indeed scans the primary index (key: PRIMARY and type: index in EXPLAIN output), and then checks wheather the start_ip_number and end_ip_number columns satisfy the WHERE condition (Extra: Using where). The amount of rows scanned highly depends on the ip address. In the worst case (no rows matching the value) you will do a table scan anyway.

Your query uses two range conditions, which can't be covered both by a B-tree index, so it has to choose one. To see which index is best, the optimizer estimates how many rows would satisfy the relevant WHERE clause from each index. The general rule of thumb is that if more than 20% of rows would be selected when using an index then it is more efficient to not use the index at all and do a full table scan.

The reason might be that indeed too many rows satisfy each one of the WHERE conditions, but it is also possible that index statistics are not accurate. You can check if the statistics are up to date with SHOW INDEX, and perhaps update them with ANALYZE TABLE. Note that ANALYZE TABLE will lock the table until it finishes

To address your specific issue you can design your table differently, using spatial indexes for the IP range and then use spatial functions to see wheather an address belongs to that range. See this post for more detail. Note that spatial indexes are available only for MyISAM tables. InnoDB supports spatial columns, but you can't have indexes on them.

There are also different approaches, requiring that the ip renges are not overlapping:

  1. Rick James' Blocks of Addresses, such as IP Addresses Reference
  2. Maciej Dobrzański's Implementing efficient Geo IP location system in MySQL