SQL Server – Best Query to Pick Row Based on IP Address and DateStamp

sql serversql-server-2008t-sql

Supposed I have Table1 that have account_id, ip_address and created_date columns but on the other table, Table2 has only ip_address and created_date columns. Table2 doesn't have an account_id that I can use to join in table1. For some reason from the management, I still need to assume or pick accounts_ids from Table1 to fill it in Table2.

What will be the best way to pick accounts_ids from Table1?
I tried the query below but it seems that there are accounts from Table1 which is obviously doesn't belong to Table2 due to created_date comparison. I wonder if there are ways to do this such as comparing YEAR(create_date) or etc?

SELECT t2.ip_address, t2.created_date, t1.account_id 
FROM Table2 t2 
LEFT JOIN ( SELECT account_id, ip_address, MAX(created_date) created_date
            FROM Table1 t1
            GROUP BY account_id, ip_address  
           ) t1 
on t2.ip_address = t1.ip_address

EDIT: Note that I just want to pick the closest date as possible since IP address and dates are not unique on Table1.

Best Answer

I'm not sure if this sample data cover your current issue.

CREATE TABLE Table1(account int, ip_address varchar(15), created_date date);
INSERT INTO Table1 VALUES
(1, '10.0.0.1', '20170101'),
(1, '10.0.0.1', '20170201'),
(1, '10.0.0.1', '20170301'),
(2, '10.0.0.2', '20170201'),
(3, '10.0.0.3', '20170201');

CREATE TABLE Table2(ip_address varchar(15), created_date date);
INSERT INTO Table2 VALUES
('10.0.0.1', '20170201'),
('10.0.0.2', '20170201'),
('10.0.0.3', '20170201');

This solution assumes that, at least, there is a created_date on your Table1 >= of the created_date on Tabel2

SELECT (SELECT TOP (1) account
        FROM     Table1 t1
        WHERE    t1.ip_address = t2.ip_address
        AND      t1.created_date >= t2.created_date
        ORDER BY t1.ip_address, t1.created_date desc) account,
       t2.ip_address, 
       t2.created_date
FROM   Table2 t2
account | ip_address | created_date       
------: | :--------- | :------------------
      1 | 10.0.0.1   | 01/02/2017 00:00:00
      2 | 10.0.0.2   | 01/02/2017 00:00:00
      3 | 10.0.0.3   | 01/02/2017 00:00:00

dbfiddle here

If there is no an exact date, you can build a list of periods:

WITH ipList as
(
    SELECT account, ip_address, created_date, 
           COALESCE(LEAD(created_date) OVER (PARTITION BY account, ip_address ORDER BY created_date), created_date) next_date
    FROM   Table1
)
select * from iplist;
GO
account | ip_address | created_date        | next_date          
------: | :--------- | :------------------ | :------------------
      1 | 10.0.0.1   | 01/01/2017 00:00:00 | 01/02/2017 00:00:00
      1 | 10.0.0.1   | 01/02/2017 00:00:00 | 01/03/2017 00:00:00
      1 | 10.0.0.1   | 01/03/2017 00:00:00 | 01/03/2017 00:00:00
      2 | 10.0.0.2   | 01/02/2017 00:00:00 | 01/02/2017 00:00:00
      3 | 10.0.0.3   | 01/02/2017 00:00:00 | 01/02/2017 00:00:00

And then try to find the best match:

WITH ipList as
(
    SELECT account, ip_address, created_date, 
           COALESCE(LEAD(created_date) OVER (PARTITION BY account, ip_address ORDER BY created_date), created_date) next_date
    FROM   Table1
)
SELECT    (SELECT TOP(1) ipList.account
           FROM   ipList
           WHERE  t2.ip_address = ipList.ip_address
           AND    t2.created_date >= ipList.created_date 
           AND    t2.created_date <=  ipList.next_date) account,
          t2.ip_address, 
          t2.created_date
FROM      Table2 t2;
GO
account | ip_address | created_date       
------: | :--------- | :------------------
      1 | 10.0.0.1   | 01/02/2017 00:00:00
      2 | 10.0.0.2   | 01/02/2017 00:00:00
      3 | 10.0.0.3   | 01/02/2017 00:00:00

dbfiddle here