MySQL – How to Use a Value from SELECT in a Joined Subquery

join;MySQLselectsubquery

I'm trying to avoid a stored procedure or bringing it into code but they may be my only alternatives. Here's my sql. Does anyone know any SQL voodoo that could make this happen without a stored procedure?

SELECT first_name,
       last_name,
       ip_address,
       ip_information.isp,
       count(*) duplicate_records
FROM   customers
LEFT OUTER JOIN (select * from ip_info.ip_address_data
                     where ip_to > INET_ATON(ip_address) limit 1
                ) ip_information using(ip_address)
GROUP BY 1,2,3,4
ORDER BY 1,2,3,4;

It's the LEFT OUTER JOIN that I'm trying to get data from a table that has ip address ranges and not exact ips.

CREATE TABLE `customers` (
  `customer_log_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
  `customer_id` int(11) DEFAULT NULL,
  `first_name` varchar(255) DEFAULT NULL,
  `last_name` varchar(255) DEFAULT NULL,
  `ip_address` varchar(20) DEFAULT NULL,
  `created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
  PRIMARY KEY (`customer_log_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;

CREATE TABLE `ip_address_data` (
    `ip_from` int(10) unsigned NOT NULL,
    `ip_to` int(10) unsigned NOT NULL DEFAULT '0',
    `country_code` char(2) DEFAULT NULL,
    `country_name` varchar(64) DEFAULT NULL,
    `region_name` varchar(128) DEFAULT NULL,
    `city_name` varchar(128) DEFAULT NULL,
    `latitude` double DEFAULT NULL,
    `longitude` double DEFAULT NULL,
    `zip_code` varchar(30) DEFAULT NULL,
    `time_zone` varchar(8) DEFAULT NULL,
    `isp` varchar(256) DEFAULT NULL,
    `domain` varchar(128) DEFAULT NULL,
    `net_speed` varchar(8) DEFAULT NULL,
    `idd_code` varchar(5) DEFAULT NULL,
    `area_code` varchar(30) DEFAULT NULL,
    PRIMARY KEY (`ip_to`),
    KEY `isp` (`isp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Explanation in words, I'm trying to select all the records from our customers and identify which ISP they're coming from. The main table is customers and the isp data is in the ip_info.ip_address_data table. The data in the customers table has the ip address the customer last used to login to the system. The other table doesn't have every ip address in the list, instead they have ranges of ips using ip_to and ip_from columns. Normally someone would do a query like this to get the data:

SELECT * FROM ip_info.ip_address_data
    WHERE ip_to   > INET_ATON('1.1.1.1')
      and ip_from < INET_ATON('1.1.1.1') 

But since I have the ip_to as the primary key it's much faster to just do ip_to > INET_ATON('1.1.1.1') and use limit so it stops returning records after the first row is found vs having to make sure it's < ip_from

Having the range in the table to join definitely makes the query more challenging.

INSERT INTO `customers` (`customer_log_id`, `customer_id`, `first_name`, `last_name`, `ip_address`, `created_at`)
VALUES
(1, 1, 'John', 'Smith', '1.0.0.1', '2016-03-11 11:59:47'),
    (2, 1, 'John', 'Smith', '1.0.0.2', '2016-03-11 11:59:52'),
    (3, 1, 'John', 'Smith', '1.0.0.3', '2016-03-11 11:59:53'),
    (4, 1, 'John', 'Smith', '1.0.0.4', '2016-03-11 11:59:55'),
    (5, 1, 'John', 'Smith', '1.0.0.1', '2016-03-11 11:59:48'),
    (6, 1, 'John', 'Smith', '1.0.0.1', '2016-03-11 11:59:49');


INSERT INTO `ip_address_data` (`ip_from`, `ip_to`, `country_code`, `country_name`, `region_name`, `city_name`, `latitude`, `longitude`, `zip_code`, `time_zone`, `isp`, `domain`, `net_speed`, `idd_code`, `area_code`, `weather_station_code`, `weather_station_name`, `mcc`, `mnc`, `mobile_brand`, `elevation`, `usage_type`)
VALUES
(0, 16777215, '-', '-', '-', '-', 0, 0, '-', '-', 'Broadcast RFC1700', '-', '-', '-', '-', '-', '-', '-', '-', '-', 0, 'RSV'),
    (16777216, 16777471, 'AU', 'Australia', 'Queensland', 'Brisbane', -27.46794, 153.02809, '4000', '+10:00', 'Research Prefix for APNIC Labs', 'apnic.net', 'T1', '61', '07', 'ASXX0016', 'Brisbane', '-', '-', '-', 39, 'DCH'),
    (16777472, 16778239, 'CN', 'China', 'Fujian', 'Fuzhou', 26.06139, 119.30611, '350004', '+08:00', 'ChinaNet Fujian Province Network', 'chinatelecom.com.cn', 'DSL', '86', '0591', 'CHXX0031', 'Fuzhou', '460', '03', 'China Telecom', 12, 'ISP/MOB'),
    (16778240, 16778495, 'AU', 'Australia', 'Victoria', 'Melbourne', -37.814, 144.96332, '8010', '+11:00', 'Golden IT Pty Ltd', 'goldenit.net.au', 'DSL', '61', '03', 'ASXX0075', 'Melbourne', '-', '-', '-', 25, 'COM'),
    (16778496, 16779007, 'AU', 'Australia', 'New South Wales', 'Sydney', -33.86785, 151.20732, '2000', '+11:00', 'Golden IT Pty Ltd', 'goldenit.net.au', 'DSL', '61', '02', 'ASXX0112', 'Sydney', '-', '-', '-', 69, 'COM'),
    (16779008, 16779263, 'AU', 'Australia', 'Victoria', 'Melbourne', -37.814, 144.96332, '8010', '+11:00', 'Golden IT Pty Ltd', 'goldenit.net.au', 'DSL', '61', '03', 'ASXX0075', 'Melbourne', '-', '-', '-', 25, 'COM');

Best Answer

You probably have a bug: It should be ip_to >= instead of ip_to > -- to cover the case where ip_address is the same as ip_to.

You must make sure that there are no overlapping ranges in ip_information; otherwise you will occasionally get strange errors.

And, as Lee pointed out, you do need

and ipd.ip_from <= INET_ATON(c1.ip_address)

Otherwise, you could fall into a gap.

But, none of that addresses the question. You need a "correlated subquery", which cannot be done with a subquery in FROM/JOIN. So, let's do...

  1. SELECT DISTINCT ip_address FROM customers ... This is an optimization so that we won't be repeatedly looking up the same ips.
  2. Reach into ip_address_data to find a set of ip_to values.
  3. Now reach back into both tables to get the rest of the info.

Something like this... (I have left out the necessary INET_ATON calls -- suggest you make your data consistent.)

First, Steps 1 and 2:

SELECT  c.ip_address, 
      ( SELECT  MIN(ip_to)
            FROM  ip_address_data
            WHERE  ip_to >= c.ip_address
            ORDER BY  ip_to
            LIMIT  1 
      ) AS ip_to
    FROM  
      ( SELECT  DISTINCT ip_address
            FROM  customers
            WHERE  ... 
      ) AS c;

See if that correctly delivers pairs of ip_address and ip_to. Yes, we have lost the customer info and ip_from; Step 3 will take care of that.

SELECT  c2..., a2...
    FROM  ( as_above ) AS b
    JOIN  customers AS c2 USING(ip_address)
    JOIN  ip_address_data AS a2 USING(ip_to)
    WHERE a2.ip_from <= b.ip_address
    GROUP BY ...
    ORDER BY ...;

Index needed:

customers: INDEX(ip_address)

One more thing... IPv6 is upon us; your table cannot handle such.