I'm trying to avoid a stored procedure or bringing it into code but they may be my only alternatives. Here's my sql. Does anyone know any SQL voodoo that could make this happen without a stored procedure?
SELECT first_name,
last_name,
ip_address,
ip_information.isp,
count(*) duplicate_records
FROM customers
LEFT OUTER JOIN (select * from ip_info.ip_address_data
where ip_to > INET_ATON(ip_address) limit 1
) ip_information using(ip_address)
GROUP BY 1,2,3,4
ORDER BY 1,2,3,4;
It's the LEFT OUTER JOIN that I'm trying to get data from a table that has ip address ranges and not exact ips.
CREATE TABLE `customers` (
`customer_log_id` int(11) unsigned NOT NULL AUTO_INCREMENT,
`customer_id` int(11) DEFAULT NULL,
`first_name` varchar(255) DEFAULT NULL,
`last_name` varchar(255) DEFAULT NULL,
`ip_address` varchar(20) DEFAULT NULL,
`created_at` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`customer_log_id`)
) ENGINE=InnoDB AUTO_INCREMENT=7 DEFAULT CHARSET=latin1;
CREATE TABLE `ip_address_data` (
`ip_from` int(10) unsigned NOT NULL,
`ip_to` int(10) unsigned NOT NULL DEFAULT '0',
`country_code` char(2) DEFAULT NULL,
`country_name` varchar(64) DEFAULT NULL,
`region_name` varchar(128) DEFAULT NULL,
`city_name` varchar(128) DEFAULT NULL,
`latitude` double DEFAULT NULL,
`longitude` double DEFAULT NULL,
`zip_code` varchar(30) DEFAULT NULL,
`time_zone` varchar(8) DEFAULT NULL,
`isp` varchar(256) DEFAULT NULL,
`domain` varchar(128) DEFAULT NULL,
`net_speed` varchar(8) DEFAULT NULL,
`idd_code` varchar(5) DEFAULT NULL,
`area_code` varchar(30) DEFAULT NULL,
PRIMARY KEY (`ip_to`),
KEY `isp` (`isp`)
) ENGINE=InnoDB DEFAULT CHARSET=latin1;
Explanation in words, I'm trying to select all the records from our customers and identify which ISP they're coming from. The main table is customers and the isp data is in the ip_info.ip_address_data table. The data in the customers table has the ip address the customer last used to login to the system. The other table doesn't have every ip address in the list, instead they have ranges of ips using ip_to and ip_from columns. Normally someone would do a query like this to get the data:
SELECT * FROM ip_info.ip_address_data
WHERE ip_to > INET_ATON('1.1.1.1')
and ip_from < INET_ATON('1.1.1.1')
But since I have the ip_to as the primary key it's much faster to just do ip_to > INET_ATON('1.1.1.1') and use limit so it stops returning records after the first row is found vs having to make sure it's < ip_from
Having the range in the table to join definitely makes the query more challenging.
INSERT INTO `customers` (`customer_log_id`, `customer_id`, `first_name`, `last_name`, `ip_address`, `created_at`)
VALUES
(1, 1, 'John', 'Smith', '1.0.0.1', '2016-03-11 11:59:47'),
(2, 1, 'John', 'Smith', '1.0.0.2', '2016-03-11 11:59:52'),
(3, 1, 'John', 'Smith', '1.0.0.3', '2016-03-11 11:59:53'),
(4, 1, 'John', 'Smith', '1.0.0.4', '2016-03-11 11:59:55'),
(5, 1, 'John', 'Smith', '1.0.0.1', '2016-03-11 11:59:48'),
(6, 1, 'John', 'Smith', '1.0.0.1', '2016-03-11 11:59:49');
INSERT INTO `ip_address_data` (`ip_from`, `ip_to`, `country_code`, `country_name`, `region_name`, `city_name`, `latitude`, `longitude`, `zip_code`, `time_zone`, `isp`, `domain`, `net_speed`, `idd_code`, `area_code`, `weather_station_code`, `weather_station_name`, `mcc`, `mnc`, `mobile_brand`, `elevation`, `usage_type`)
VALUES
(0, 16777215, '-', '-', '-', '-', 0, 0, '-', '-', 'Broadcast RFC1700', '-', '-', '-', '-', '-', '-', '-', '-', '-', 0, 'RSV'),
(16777216, 16777471, 'AU', 'Australia', 'Queensland', 'Brisbane', -27.46794, 153.02809, '4000', '+10:00', 'Research Prefix for APNIC Labs', 'apnic.net', 'T1', '61', '07', 'ASXX0016', 'Brisbane', '-', '-', '-', 39, 'DCH'),
(16777472, 16778239, 'CN', 'China', 'Fujian', 'Fuzhou', 26.06139, 119.30611, '350004', '+08:00', 'ChinaNet Fujian Province Network', 'chinatelecom.com.cn', 'DSL', '86', '0591', 'CHXX0031', 'Fuzhou', '460', '03', 'China Telecom', 12, 'ISP/MOB'),
(16778240, 16778495, 'AU', 'Australia', 'Victoria', 'Melbourne', -37.814, 144.96332, '8010', '+11:00', 'Golden IT Pty Ltd', 'goldenit.net.au', 'DSL', '61', '03', 'ASXX0075', 'Melbourne', '-', '-', '-', 25, 'COM'),
(16778496, 16779007, 'AU', 'Australia', 'New South Wales', 'Sydney', -33.86785, 151.20732, '2000', '+11:00', 'Golden IT Pty Ltd', 'goldenit.net.au', 'DSL', '61', '02', 'ASXX0112', 'Sydney', '-', '-', '-', 69, 'COM'),
(16779008, 16779263, 'AU', 'Australia', 'Victoria', 'Melbourne', -37.814, 144.96332, '8010', '+11:00', 'Golden IT Pty Ltd', 'goldenit.net.au', 'DSL', '61', '03', 'ASXX0075', 'Melbourne', '-', '-', '-', 25, 'COM');
Best Answer
You probably have a bug: It should be
ip_to >=
instead ofip_to >
-- to cover the case whereip_address
is the same asip_to
.You must make sure that there are no overlapping ranges in
ip_information
; otherwise you will occasionally get strange errors.And, as Lee pointed out, you do need
Otherwise, you could fall into a gap.
But, none of that addresses the question. You need a "correlated subquery", which cannot be done with a subquery in
FROM
/JOIN
. So, let's do...SELECT DISTINCT ip_address FROM customers ...
This is an optimization so that we won't be repeatedly looking up the same ips.ip_address_data
to find a set ofip_to
values.Something like this... (I have left out the necessary INET_ATON calls -- suggest you make your data consistent.)
First, Steps 1 and 2:
See if that correctly delivers pairs of
ip_address
andip_to
. Yes, we have lost the customer info andip_from
; Step 3 will take care of that.Index needed:
One more thing... IPv6 is upon us; your table cannot handle such.