Sql-server – Select from multiple rows without duplicate values, with all random data

duplicationquerysql server

I have a table that gets results from three different sources. Each column represents a source, and each row a result of an outcome. There are over 50k rows for a total of 150k results.

I need to run a report that within these results, I want to remove duplicates leaving the unique values behind, in their respective columns. The majority of the results will all be duplicates, and I would assume around ~500 are unique.

The other 'remove duplicate from multiple columns' posts haven't worked for me; any combo of distinct, groups, and unions I have not been able to get to work.

Example of data below. Thanks.

Raw Data:
Data'r

Expected Results:
Results

Squiggles:
Squiggles

Best Answer

I broke this down using pivot and not exists. I really would handle this in the presentation layer though.

--load test data
declare @table table (c1 int, c2 int, c3 int)
insert into @table
values
(1,1,1)
,(1,1,1)
,(2,3,2)
,(4,2,4)
,(5,4,6)
,(7,5,8)
,(9,7,11)
,(11,9,13)
,(14,16,15)

--get our unique values in a cte to pivot later
;with cte as(
select 
    --here we add a RN so that we can use pivot without losing values
    r = row_number() over (partition by Col order by (select 1))
    ,i.*
from
    (
    --for each column, we get the unique values where they don't exist in the other two columns
    --we union them together, but give them 1 /2 / 3 column identifier
    select
        1 as Col, c1.c1 as val
    from
        (select distinct t1.c1 from @table t1
         where  not exists (select 1 from @table t2 where t2.c2 = t1.c1)
            and not exists (select 1 from @table t3 where t3.c3 = t1.c1)) c1
    union
    select 
        2 as col, c2.c2
    from
        (select distinct t1.c2 from @table t1
         where  not exists (select 1 from @table t2 where t2.c1 = t1.c2)
            and not exists (select 1 from @table t3 where t3.c3 = t1.c2)) c2 
    union
    select
        3 as col, c3.c3
    from
        (select distinct t1.c3 from @table t1
         where  not exists (select 1 from @table t2 where t2.c1 = t1.c3)
            and not exists (select 1 from @table t3 where t3.c2 = t1.c3)) c3
    ) i
)


--simple pivot
select
    [1], [2], [3]
from cte 
pivot
(max(Val) for Col in ([1],[2],[3]))
p

RETURNS

+------+------+----+
|  1   |  2   | 3  |
+------+------+----+
| 14   | 3    |  6 |
| NULL | 16   |  8 |
| NULL | NULL | 13 |
| NULL | NULL | 15 |
+------+------+----+