SQL Server – Best Query to Pick Row Based on IP Address and DateStamp

sql serversql-server-2008t-sql

Supposed I have Table1 that have account_id, ip_address and created_date columns but on the other table, Table2 has only ip_address and created_date columns. Table2 doesn't have an account_id that I can use to join in table1. For some reason from the management, I still need to assume or pick accounts_ids from Table1 to fill it in Table2.

What will be the best way to pick accounts_ids from Table1?
I tried the query below but it seems that there are accounts from Table1 which is obviously doesn't belong to Table2 due to created_date comparison. I wonder if there are ways to do this such as comparing YEAR(create_date) or etc?

SELECT t2.ip_address, t2.created_date, t1.account_id 
FROM Table2 t2 
LEFT JOIN ( SELECT account_id, ip_address, MAX(created_date) created_date
            FROM Table1 t1
            GROUP BY account_id, ip_address  
           ) t1 
on t2.ip_address = t1.ip_address

EDIT: Note that I just want to pick the closest date as possible since IP address and dates are not unique on Table1.

Best Answer

I'm not sure if this sample data cover your current issue.

CREATE TABLE Table1(account int, ip_address varchar(15), created_date date);
INSERT INTO Table1 VALUES
(1, '10.0.0.1', '20170101'),
(1, '10.0.0.1', '20170201'),
(1, '10.0.0.1', '20170301'),
(2, '10.0.0.2', '20170201'),
(3, '10.0.0.3', '20170201');

CREATE TABLE Table2(ip_address varchar(15), created_date date);
INSERT INTO Table2 VALUES
('10.0.0.1', '20170201'),
('10.0.0.2', '20170201'),
('10.0.0.3', '20170201');

This solution assumes that, at least, there is a created_date on your Table1 >= of the created_date on Tabel2

SELECT (SELECT TOP (1) account
        FROM     Table1 t1
        WHERE    t1.ip_address = t2.ip_address
        AND      t1.created_date >= t2.created_date
        ORDER BY t1.ip_address, t1.created_date desc) account,
       t2.ip_address, 
       t2.created_date
FROM   Table2 t2

account | ip_address | created_date       
------: | :--------- | :------------------
      1 | 10.0.0.1   | 01/02/2017 00:00:00
      2 | 10.0.0.2   | 01/02/2017 00:00:00
      3 | 10.0.0.3   | 01/02/2017 00:00:00

dbfiddle here

If there is no an exact date, you can build a list of periods:

WITH ipList as
(
    SELECT account, ip_address, created_date, 
           COALESCE(LEAD(created_date) OVER (PARTITION BY account, ip_address ORDER BY created_date), created_date) next_date
    FROM   Table1
)
select * from iplist;
GO

account | ip_address | created_date        | next_date          
------: | :--------- | :------------------ | :------------------
      1 | 10.0.0.1   | 01/01/2017 00:00:00 | 01/02/2017 00:00:00
      1 | 10.0.0.1   | 01/02/2017 00:00:00 | 01/03/2017 00:00:00
      1 | 10.0.0.1   | 01/03/2017 00:00:00 | 01/03/2017 00:00:00
      2 | 10.0.0.2   | 01/02/2017 00:00:00 | 01/02/2017 00:00:00
      3 | 10.0.0.3   | 01/02/2017 00:00:00 | 01/02/2017 00:00:00

And then try to find the best match:

WITH ipList as
(
    SELECT account, ip_address, created_date, 
           COALESCE(LEAD(created_date) OVER (PARTITION BY account, ip_address ORDER BY created_date), created_date) next_date
    FROM   Table1
)
SELECT    (SELECT TOP(1) ipList.account
           FROM   ipList
           WHERE  t2.ip_address = ipList.ip_address
           AND    t2.created_date >= ipList.created_date 
           AND    t2.created_date <=  ipList.next_date) account,
          t2.ip_address, 
          t2.created_date
FROM      Table2 t2;
GO

account | ip_address | created_date       
------: | :--------- | :------------------
      1 | 10.0.0.1   | 01/02/2017 00:00:00
      2 | 10.0.0.2   | 01/02/2017 00:00:00
      3 | 10.0.0.3   | 01/02/2017 00:00:00

dbfiddle here

Related Solutions

SQL Server Row Differences – How to Show Rows Different Between Two Tables or Queries

You don't need 30 join conditions for a FULL OUTER JOIN here.

You can just Full Outer Join on the PK, preserve rows with at least one difference with WHERE EXISTS (SELECT A.* EXCEPT SELECT B.*) and use CROSS APPLY (SELECT A.* UNION ALL SELECT B.*) to unpivot out both sides of the JOINed rows into individual rows.

WITH TableA(Col1, Col2, Col3) 
     AS (SELECT 'Dog',1,1     UNION ALL 
         SELECT 'Cat',27,86   UNION ALL 
         SELECT 'Cat',128,92), 
     TableB(Col1, Col2, Col3) 
     AS (SELECT 'Dog',1,1     UNION ALL 
         SELECT 'Cat',27,105  UNION ALL 
         SELECT 'Lizard',83,NULL) 
SELECT CA.*
FROM   TableA A 
       FULL OUTER JOIN TableB B 
         ON A.Col1 = B.Col1 
            AND A.Col2 = B.Col2 
/*Unpivot the joined rows*/
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
             SELECT 'TableB' AS what, B.*) AS CA     
/*Exclude identical rows*/
WHERE  EXISTS (SELECT A.* 
               EXCEPT 
               SELECT B.*) 
/*Discard NULL extended row*/
AND CA.Col1 IS NOT NULL      
ORDER BY CA.Col1, CA.Col2

Gives

what   Col1   Col2        Col3
------ ------ ----------- -----------
TableA Cat    27          86
TableB Cat    27          105
TableA Cat    128         92
TableB Lizard 83          NULL

Or a version dealing with the moved goalposts.

SELECT DISTINCT CA.*
FROM   TableA A 
       FULL OUTER JOIN TableB B 
         ON EXISTS (SELECT A.*  INTERSECT  SELECT B.*) 
CROSS APPLY (SELECT 'TableA' AS what, A.* UNION ALL
             SELECT 'TableB' AS what, B.*) AS CA     
WHERE NOT EXISTS (SELECT A.*  INTERSECT  SELECT B.*) 
AND CA.Col1 IS NOT NULL
ORDER BY CA.Col1, CA.Col2

For tables with many columns it can still be difficult to identify the specific column(s) that differ. For that you can potentially use the below.

(though just on relatively small tables as otherwise this method likely won't have adequate performance)

SELECT t1.primary_key,
       y1.c,
       y1.v,
       y2.v
FROM   t1
       JOIN t2
         ON t1.primary_key = t2.primary_key
       CROSS APPLY (SELECT t1.*
                    FOR xml path('row'), elements xsinil, type) x1(x)
       CROSS APPLY (SELECT t2.*
                    FOR xml path('row'), elements xsinil, type) x2(x)
       CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
                           n.n.value('.', 'nvarchar(max)')
                    FROM   x1.x.nodes('row/*') AS n(n)) y1(c, v)
       CROSS APPLY (SELECT n.n.value('local-name(.)', 'sysname'),
                           n.n.value('.', 'nvarchar(max)')
                    FROM   x2.x.nodes('row/*') AS n(n)) y2(c, v)
WHERE  y1.c = y2.c
       AND EXISTS(SELECT y1.v
                  EXCEPT
                  SELECT y2.v)

Sql-server – Why aren’t primary key / foreign key matches used for joins

In many cases, there are more than one way to join two tables; See the other answers for lots of examples. Of course, one could say that it would be an error to use the 'automatic join' in those cases. Then only a handfull of simple cases where it can be used would be left.

However, there is a severe drawback! Queries that are correct today, might become an error tomorrow just by adding a second FK to the same table!

Let me say that again: by adding columns, queries that do not use those columns could turn from 'correct' into 'error'!

That is such a maintenance nightmare, that any sane style guide would prohibit to use this feature. Most already prohibit select * for the same reason!

All this would be acceptable, if performance would be enhanced. However, that's not the case.

Summarizing, this feature could be used in only a limited set of simple cases, does not increase performance, and most style guides would prohibit its usage anyway.

Therefor it is not supprising that most database vendors choose to spend their time on more important things.

Best Answer

Related Solutions

SQL Server Row Differences – How to Show Rows Different Between Two Tables or Queries

Sql-server – Why aren’t primary key / foreign key matches used for joins

Related Question