Sql-server – Select from multiple rows without duplicate values, with all random data

duplicationquerysql server

I have a table that gets results from three different sources. Each column represents a source, and each row a result of an outcome. There are over 50k rows for a total of 150k results.

I need to run a report that within these results, I want to remove duplicates leaving the unique values behind, in their respective columns. The majority of the results will all be duplicates, and I would assume around ~500 are unique.

The other 'remove duplicate from multiple columns' posts haven't worked for me; any combo of distinct, groups, and unions I have not been able to get to work.

Example of data below. Thanks.

Raw Data:

Expected Results:

Squiggles:

Best Answer

I broke this down using pivot and not exists. I really would handle this in the presentation layer though.

--load test data
declare @table table (c1 int, c2 int, c3 int)
insert into @table
values
(1,1,1)
,(1,1,1)
,(2,3,2)
,(4,2,4)
,(5,4,6)
,(7,5,8)
,(9,7,11)
,(11,9,13)
,(14,16,15)

--get our unique values in a cte to pivot later
;with cte as(
select 
    --here we add a RN so that we can use pivot without losing values
    r = row_number() over (partition by Col order by (select 1))
    ,i.*
from
    (
    --for each column, we get the unique values where they don't exist in the other two columns
    --we union them together, but give them 1 /2 / 3 column identifier
    select
        1 as Col, c1.c1 as val
    from
        (select distinct t1.c1 from @table t1
         where  not exists (select 1 from @table t2 where t2.c2 = t1.c1)
            and not exists (select 1 from @table t3 where t3.c3 = t1.c1)) c1
    union
    select 
        2 as col, c2.c2
    from
        (select distinct t1.c2 from @table t1
         where  not exists (select 1 from @table t2 where t2.c1 = t1.c2)
            and not exists (select 1 from @table t3 where t3.c3 = t1.c2)) c2 
    union
    select
        3 as col, c3.c3
    from
        (select distinct t1.c3 from @table t1
         where  not exists (select 1 from @table t2 where t2.c1 = t1.c3)
            and not exists (select 1 from @table t3 where t3.c2 = t1.c3)) c3
    ) i
)


--simple pivot
select
    [1], [2], [3]
from cte 
pivot
(max(Val) for Col in ([1],[2],[3]))
p

RETURNS

+------+------+----+
|  1   |  2   | 3  |
+------+------+----+
| 14   | 3    |  6 |
| NULL | 16   |  8 |
| NULL | NULL | 13 |
| NULL | NULL | 15 |
+------+------+----+

Related Solutions

Sql-server – Finding rows with duplicate values

This should return tag the records that need attention. I put the tagging in SELECT, but you could easily turn this into a second CTE and simply select out the payments to clean up.

-- 
-- find all accounts with more than one payment and mark payments to cancel
--
WITH cte_DuplicatePayments AS
(
SELECT COUNT(*) OVER(PARTITION BY accountID) AS numberOfPaymentsPerAccountID
, COUNT(*) OVER(partition BY accountID, amount) AS numberOfPaymentsPerAccountIDAndAmount
, ROW_NUMBER() OVER(partition BY accountID ORDER BY amount asc) AS PaymentsNumberPerAccountID
, *
FROM ScheduledPayment
)
SELECT CASE 
    WHEN numberOfPaymentsPerAccountID != numberOfPaymentsPerAccountIDAndAmount THEN 'MARK AS CANCELLED: Duplicate Payments with amount mismatch' 
    WHEN PaymentsNumberPerAccountID > 1 THEN 'MARK AS CANCELLED: Duplicate Payments with matching amount' 
    ELSE ''
   END AS PaymentAuditAction
, ScheduledPaymentID, accountID, amount,
FROM cte_DuplicatePayments
WHERE numberOfPaymentsPerAccountID > 1

Mysql – How to find duplicate values in multiple columns over different rows

Not sure why the question popped up now, but if you are still interested in an answer something like:

select g1.* 
from games g1 
where exists ( 
    select 1 
    from games g2 
    where g1.gameid <> g2.gameid 
      and least(g1.hometeam,g1.awayteam) 
        = least(g2.hometeam,g2.awayteam) 
      and greatest(g1.hometeam,g1.awayteam) 
        = greatest(g2.hometeam,g2.awayteam) 
      and abs(datediff(g1.d, g2.d)) < 2
);

should give you what you need

Best Answer

Related Solutions

Sql-server – Finding rows with duplicate values

Mysql – How to find duplicate values in multiple columns over different rows

Related Question