Sql-server – Duplicates when checking for two different column values based on a key that is not unique in table

sql serversql-server-2012

I want to check to see if two orders were placed for one person within an hour of one another.

Example data

PERSONID | proc_code | TIME
123 | 1 | 4/25/2016 00:01:00
123 | 2 | 4/25/2016 00:01:00
123 | 2 | 4/25/2016 00:02:00
111 | 1 | 4/25/2016 00:01:00
111 | 1 | 4/25/2016 00:03:00
111 | 2 | 4/25/2016 00:01:00

SELECT 
persons.person_id,
order_proc.proc_code,
(CASE WHEN order_proc.proc_code='1'
then case when order_proc2.proc_code='2' and order_proc.time=order_proc2.time
then 'Y'
else 'N'
end
end) as 'Y/N?'
from order_proc
inner join PERSONS on order_proc.PERSON_ID=PERSONS.PERSON_ID
inner join order_proc as order_proc2 on order_proc2.PERSON_ID=PERSONS.PERSON_ID
where order_proc.ordertime=order_proc2.ordertime

I then get duplicate rows if a person has had multiple "2" orders.

What I would like to see returned is for every proc_code=1 check to see if a 2 has been ordered at the same time as 1 was ordered (though ideally it will be within an hour of one another but I can figure that out later).

The furthest I've made it is with the above code but it isn't showing me all 1 orders when 2 wasn't ordered. The where statement above limits me from doing this because it is only matching where times are equal so I don't see any values returned as 'N'

What I'd like to see based on the above sample data:

PERSON_ID Proc_code Y/N?
123|1|Y
111|1|Y
111|1|N

Is there a way I can do this with a while loop and without duplicates? I'm guessing my duplicates are coming from the second join where I alias order_proc as order_proc2.

I am dealing with millions of rows and joining to several other tables as well to pull different types of information for the Persons and orders. I need to check a huge table (order_proc) whose foreign key is a order_procedure_id. There are several persons each with multiple orders.

Best Answer

Based on what you have shared, this should work:

Table and data

CREATE TABLE #Orders (Personid int, proc_code int, time datetime)

INSERT INTO #Orders VALUES (123, 1,  '4/25/2016 00:01:00')
INSERT INTO #Orders VALUES (123, 2,  '4/25/2016 00:01:00')
INSERT INTO #Orders VALUES (123, 2,  '4/25/2016 00:02:00')
INSERT INTO #Orders VALUES (111, 1,  '4/25/2016 00:01:00')
INSERT INTO #Orders VALUES (111, 1,  '4/25/2016 00:03:00')
INSERT INTO #Orders VALUES (111, 2,  '4/25/2016 00:01:00')

Query

select a.PersonId, a.proc_code, 
    CASE WHEN b.proc_code = 2 THEN 'Y' ELSE 'N' END AS 'Y/N'
FROM #Orders a
LEFT JOIN #Orders b 
    ON a.time = b.time 
    AND a.Personid = b.Personid 
    AND a.proc_code != b.proc_code
WHERE a.proc_code = 1

DROP TABLE #Orders

Related Solutions

Sql-server – Finding missing gaps of data in a table with ~2.5 Million rows

There is no need to generate dates.

The following query will give you a list of SHORTCODES with no rows at all:

select SHORTCODE from shortcodes
except
select SHORTCODE from VWTBL_INDICATOR

The following query will give you the continuous ranges of MonthYear per SHORTCODE.

select      SHORTCODE
            ,min(MonthYear) as from_MonthYear
            ,max(MonthYear) as to_MonthYear
            ,count(*)       as months

from       (SELECT   SHORTCODE
                    ,MonthYear
                    ,row_number() over (partition by SHORTCODE order by MonthYear)  as rn

            From     VWTBL_INDICATOR
            ) t

group by    SHORTCODE
            ,DATEADD(month,-rn,MonthYear)   

order by    SHORTCODE
            ,from_MonthYear

If you wish you can use the following version which has an additional layer of information:

missing_from_MonthYear + to_MonthYear: missing range in the middle
ranges: Number of ranges per SHORTCODE (ranges>1 means you have gaps in the middle)
range_seq: the sequential number of each SHORTCODE range
is_first: Indication for the first range per SHORTCODE (check from_MonthYear to see if you are missing preceding dates)
is_last: Indication for the last range per SHORTCODE (check to_MonthYear to see if you are missing following dates)

select      SHORTCODE
           ,from_MonthYear                                                                                  as exists_from_MonthYear
           ,to_MonthYear                                                                                    as exists_to_MonthYear
           ,dateadd (day,1,to_MonthYear)                                                                    as missing_from_MonthYear
           ,dateadd (day,-1,lead (from_MonthYear) over (partition by SHORTCODE order by from_MonthYear))    as missing_to_MonthYear
           ,count       (*) over (partition by SHORTCODE)                                                   as ranges
           ,row_number  ()  over (partition by SHORTCODE order by from_MonthYear)                           as range_seq
           ,case from_MonthYear when min(from_MonthYear) over (partition by SHORTCODE) then 1 end           as is_first
           ,case to_MonthYear   when max(to_MonthYear)   over (partition by SHORTCODE) then 1 end           as is_last

from       (select      SHORTCODE
                       ,min(MonthYear)  as from_MonthYear
                       ,max(MonthYear)  as to_MonthYear
                       ,count(*)        as months

            from       (SELECT      SHORTCODE
                                   ,MonthYear
                                   ,row_number() over (partition by SHORTCODE order by MonthYear)   as rn

                        From        VWTBL_INDICATOR
                        ) t

            group by    SHORTCODE
                       ,DATEADD(month,-rn,MonthYear)    
            ) t

order by    SHORTCODE
           ,from_MonthYear

Sql-server – Using CROSS APPLY with GROUP BY and TOP 1 with duplicate data

If you're looking for MAX(Modified) field over ProductNumber, you can use ROW_NUMBER() function, and then get all rows where row number = 1.

WITH selMax AS
(
    SELECT ID, ProductNumber, DateCreated, Modified,
           ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY Modified DESC, 
                                                                  DateCreated DESC) RNum
    FROM   #ProductStatus
)
SELECT ID, ProductNumber, DateCreated, Modified
FROM   selMax
WHERE  RNum = 1
GO

ID | ProductNumber | DateCreated         | Modified           
-: | ------------: | :------------------ | :------------------
11 |      20070098 | 20/03/2009 14:09:52 | 10/10/2014 20:22:59
 3 |      20070099 | 18/12/2008 09:26:58 | 10/12/2014 20:22:59

Filtering by ProductNumber:

WITH selMax AS
(
    SELECT ID, ProductNumber, DateCreated, Modified,
           ROW_NUMBER() OVER (PARTITION BY ProductNumber ORDER BY Modified DESC, 
                                                                  DateCreated DESC) RNum
    FROM   #ProductStatus
    WHERE  ProductNumber = 20070098
)
SELECT ID, ProductNumber, DateCreated, Modified
FROM   selMax
WHERE  RNum = 1
GO

ID | ProductNumber | DateCreated         | Modified           
-: | ------------: | :------------------ | :------------------
11 |      20070098 | 20/03/2009 14:09:52 | 10/10/2014 20:22:59

dbfiddle here